Alphafold

Introduction

Alphafold is a protein structure prediction tool developed by DeepMind (Google). It uses a novel machine learning approach to predict 3D protein structures from primary sequences alone. The source code is available on Github. It has been deployed in all RCAC clusters, supporting both CPU and GPU.

It also relies on a huge database. The full database (~2.2TB) has been downloaded and setup for users.

Versions

  • 2.1.1

  • 2.2.0

  • 2.2.3

Commands

run_alphafold.sh

Module

You can load the modules by:

module load biocontainers
module load alphafold/2.1.1

Usage

The usage of Alphafold on our cluster is very straightford:

run_alphafold.sh --flagfile=$AlphaDB --fasta_paths=XX --output_dir=XX ...

$AlphaDB (/depot/itap/datasets/alphafold/full_db.ff) is a configuration file passed to AlphaFold containing the location of the database. Typically it should not be edited. Users can add other parameters based on your needs.

Users can check its detaied user guide in its Github.

AlphaDB

Contents of $AlphaDB:

--db_preset=full_dbs
--bfd_database_path=/depot/itap/datasets/alphafold/db/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--data_dir=/depot/itap/datasets/alphafold/db/
--uniref90_database_path=/depot/itap/datasets/alphafold/db/uniref90/uniref90.fasta
--mgnify_database_path=/depot/itap/datasets/alphafold/db/mgnify/mgy_clusters_2018_12.fa
--uniclust30_database_path=/depot/itap/datasets/alphafold/db/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--pdb70_database_path=/depot/itap/datasets/alphafold/db/pdb70/pdb70
--template_mmcif_dir=/depot/itap/datasets/alphafold/db/pdb_mmcif/mmcif_files
--max_template_date=2022-01-29
--obsolete_pdbs_path=/depot/itap/datasets/alphafold/db/pdb_mmcif/obsolete.dat
--hhblits_binary_path=/usr/bin/hhblits
--hhsearch_binary_path=/usr/bin/hhsearch
--jackhmmer_binary_path=/usr/bin/jackhmmer
--kalign_binary_path=/usr/bin/kalign

Example job using CPU

Warning

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run alphafold using CPU:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name
#SBATCH -t 20:00:00
#SBATCH -N 1
#SBATCH -n 24
#SBATCH --job-name=alphafold
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers alphafold/2.1.1

run_alphafold.sh  --flagfile=$AlphaDB --fasta_paths=/scratch/bell/zhan4429/Containers4/alphafold/sample.fasta --max_template_date=2022-02-01 \
--output_dir=/scratch/bell/zhan4429/Containers4/alphafold/af2_full --model_preset=monomer

Example job using GPU

Warning

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run alphafold using GPU:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name
#SBATCH -t 20:00:00
#SBATCH -N 1
#SBATCH -n 24
#SBATCH --gres=gpu:1
#SBATCH --job-name=alphafold
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers alphafold/2.1.1

run_alphafold.sh  --flagfile=$AlphaDB --fasta_paths=/scratch/bell/zhan4429/Containers4/alphafold/sample.fasta --max_template_date=2022-02-01 \
--output_dir=/scratch/bell/zhan4429/Containers4/alphafold/af2_full --model_preset=monomer