SRA-Toolkit

Introduction

SRA-Toolkit is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. Its detailed documentation can be found in https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc.

Versions

2.11.0-pl5262

Commands

abi-dump
align-cache
align-info
bam-load
cache-mgr
cg-load
fasterq-dump
fasterq-dump-orig
fastq-dump
fastq-dump-orig
illumina-dump
kar
kdbmeta
kget
latf-load
md5cp
prefetch
prefetch-orig
rcexplain
read-filter-redact
sam-dump
sam-dump-orig
sff-dump
sra-pileup
sra-pileup-orig
sra-sort
sra-sort-cg
sra-stat
srapath
srapath-orig
sratools
test-sra
vdb-config
vdb-copy
vdb-diff
vdb-dump
vdb-encrypt
vdb-lock
vdb-passwd
vdb-unlock
vdb-validate

Module

You can load the modules by:

module load biocontainers
module load sra-tools/2.11.0-pl5262

Configuring SRA-Toolkit

Users can config SRA-Toolkit by the command vdb-config. For example, the below command set up the current working directory for downloading:

vdb-config --prefetch-to-cwd

Example job

Warning

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run SRA-Toolkit on our cluster:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --job-name=SRA-Toolkit
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers sra-tools/2.11.0-pl5262

vdb-config --prefetch-to-cwd # The data will be downloaded to the current working directory.
prefetch SRR11941281
fastq-dump --split-3 SRR11941281/SRR11941281.sra