TransDecoder

Introduction

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

  • TransDecoder identifies likely coding sequences based on the following criteria:

  • a minimum length open reading frame (ORF) is found in a transcript sequence

  • a log-likelihood score similar to what is computed by the GeneID software is > 0.

  • the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 2 forward reading frames.

  • if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).

  • a PSSM is built/trained/used to refine the start codon prediction.

  • optional the putative peptide has a match to a Pfam domain above the noise cutoff score.

Detailed usage can be found here: https://github.com/TransDecoder/TransDecoder/wiki#running-transdecoder

Versions

  • 5.5.0

Commands

  • TransDecoder.LongOrfs

  • TransDecoder.Predict

  • cdna_alignment_orf_to_genome_orf.pl

  • compute_base_probs.pl

  • exclude_similar_proteins.pl

  • fasta_prot_checker.pl

  • ffindex_resume.pl

  • gene_list_to_gff.pl

  • get_FL_accs.pl

  • get_longest_ORF_per_transcript.pl

  • get_top_longest_fasta_entries.pl

  • gff3_file_to_bed.pl

  • gff3_file_to_proteins.pl

  • gff3_gene_to_gtf_format.pl

  • gtf_genome_to_cdna_fasta.pl

  • gtf_to_alignment_gff3.pl

  • gtf_to_bed.pl

  • nr_ORFs_gff3.pl

  • pfam_runner.pl

  • refine_gff3_group_iso_strip_utrs.pl

  • refine_hexamer_scores.pl

  • remove_eclipsed_ORFs.pl

  • score_CDS_likelihood_all_6_frames.pl

  • select_best_ORFs_per_transcript.pl

  • seq_n_baseprobs_to_loglikelihood_vals.pl

  • start_codon_refinement.pl

  • train_start_PWM.pl

  • uri_unescape.pl

Module

You can load the modules by:

module load biocontainers
module load transdecoder

Example job

Warning

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run transdecoder on our our clusters:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name
#SBATCH -t 20:00:00
#SBATCH -N 1
#SBATCH -n 24
#SBATCH --job-name=transdecoder
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers transdecoder

gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta
gtf_to_alignment_gff3.pl transcripts.gtf > transcripts.gff3
TransDecoder.LongOrfs -t transcripts.fasta
TransDecoder.Predict -t transcripts.fasta