MetaPhlAn 3
Introduction
MetaPhlAn
(Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
up to 25,000 reads-per-second (on one CPU) analysis speed (orders of magnitude faster compared to existing methods);
unambiguous taxonomic assignments as the MetaPhlAn markers are clade-specific;
accurate estimation of organismal relative abundance (in terms of number of cells rather than fraction of reads);
species-level resolution for bacteria, archaea, eukaryotes and viruses;
extensive validation of the profiling accuracy on several synthetic datasets and on thousands of real metagenomes.
Versions
3.0.14
3.0.9
4.0.2
Commands
metaphlan
Database
The lastest version of database(mpa_v30) has been downloaded and built in /depot/itap/datasets/metaphlan/
.
Module
You can load the modules by:
module load biocontainers
module load metaphlan/3.0.14
Example job
Warning
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run MetaPhlAn on our cluster:
#!/bin/bash
#SBATCH -A myallocation # Allocation name
#SBATCH -t 10:00:00
#SBATCH -N 1
#SBATCH -n 24
#SBATCH --job-name=MetaPhlAn
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml biocontainers metaphlan/3.0.14
DATABASE=/depot/itap/datasets/metaphlan/
metaphlan SRR11234553_1.fastq,SRR11234553_2.fastq --input_type fastq --nproc 24 -o profiled_metagenome.txt --bowtie2db $DATABASE --bowtie2out metagenome.bowtie2.bz2