SYNOPSIS FOR RESISTOME WORKSHOPS (2019-2020)


HOW TO ANALYZE THE DATA?

# How to get my favorite SRA data?

- Let’s say I’m interested in the skin metagenome data set in Bioproject 266117 (prjna266117)

- I can access the data at https://www.ncbi.nlm.nih.gov/sra by searching for SRA bioproject for prjna266117 or SRA bioproject for 266117, which will lead me to:  https://www.ncbi.nlm.nih.gov/bioproject/?term=(prjna266117)%20AND%20bioproject_sra[filter] %20NOT%20bioproject_gap[filter]

OR

- I will download and install the SRA toolkit to smoothly download the data (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/)

# How to use SRA toolkit to get the data?

fastq-dump sequence_file_name—fasta --gzip --outdir bioproject_266117; done [bioproject_266117 is an output folder that I created]

YOU CAN ALSO download in fastq format

# Diamond commands:

If your database is not made yet, especially if it’s a custom one, you will want to make the database from a fasta file (here antimicrobial resistance proteins):

diamond makedb --in db.faa -d databasename  

e.g., diamond makedb --in /data/resistome/blactamases_2019.fasta -d blaDB #creates a database for all beta lactamases that I had collected and curated (with student help) and named it here blaDB for future reference

# How to DIAMOND?

diamond blastx -t externalـdrive -d diamond_database -q fastq filtered query file -o output.f6 -k 1

# -k or —max-target-seqs: maximum number of target sequences to report alignments for

e.g.,
mkdir myAMR_analysis

diamond blastx -t ./myAMR_analysis -d myAMR_analysis/blaDB -q /metagenome_files/SRR5535731.fastq -o SRR5535731.vs.blaDB -k 1

# To see the alignments for the BLAST hits

diamond blastx -t ./myAMR_analysis -d myAMR_analysis/blaDB -q /metagenome_files/SRR5535731.fastq -o SRR5535731.vs.blaDB -k 1 --outfmt 0 #output format (-f or --outfmt) with the code 0 will give full alignments

# To change the statistical threshold (Expect value or E-value), add -e or --evalue

# How to copy files into my directory (folder)?

cd or cd ~ #changes the directory to your user directory, from wherever you are

mkdir skinMGs #creates a directory in your user directory

cp /data/MGs/skin/SRRxxxx ./skinMGs/ # (SRRxxx = your assigned file)

or, if you're ambitious ...

cp /data/MGs/skin/*fq skinMGs/ #copies all files that end with 'fq' and that are in the subdirectory 'skin' (under subdirectory 'MGs', which is in the public directory 'data') to the directory you just created

To check if the files were properly copied:

ls #lists all the files in the directory you are in

So, to have a list of the files you just copied, do either:

ls /skinMGs

or 

ls /skinMGs/*

or 

cd skinMGs

ls

For more details on the files:

ls -l #lists all the files, with all their details, including their size in bytes

or

ls -lh #this will list the files as above, but will give you the sizes in a "human readable" way (whence the -h), i.e., will tell you 18 M (megabytes) instead of 18202538

If the list is long (which it could be, since there are 60+ files), you can read it page by page if you pipe to the program 'less'

ls -l | less

one way to count these files is:

ls | wc #gives you the line, word, and character count by using the word count command


ADVANCED

#If you want to run several files

# So, with default E-value (0.001):  

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t /data/MGs/gut/Subsampled_60K -d blaDB -q $i -o $i.f6.txt -k 1; done

# With a permissive E-value (1)

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t /data/MGs/gut/Subsampled_60K -d blaDB -q $i -o $i.f6.e_1.txt -k 1 -e 1; done

# With a more restrictive E-value (1e-10)

for i in $(ls SRR*sub.fq; do diamond blastx -b 1 -t /data/MGs/gut/Subsampled_60K -d blaDB -q $i -o $i.f6.e_10.txt -k 1 --evalue e-10; done #--evalue is the same as -e

#To do all the above in an analysis directory in your user directory

cd

mkdir myAMR_analysis #(if you have not yet created that directory)

cd myAMR_analysis

ls #just to verify you the directory is there

# Next 

cd ~/skinMGs

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t myAMR_analysis -d ~/blaDB -q $i -o myAMR_analysis/$i.f6.txt -k 1; done

and so on...

e.g., 

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t myAMR_analysis -d ~/blaDB -q $i -o myAMR_analysis/$i.f6.e_10.txt -k 1 -e 1e-10; done #changes E-value threshold

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t myAMR_analysis -d ~/blaDB -q $i -o myAMR_analysis/$i.f0.txt -k 1 -f 0; done #changes output format from default (6) to aligned (0)

for i in $(ls SRR*sub.fq); do diamond blastx -b 1 -t myAMR_analysis -d ~/cardDB -q $i -o myAMR_analysis/$i.f6.txt -k 1; done #changes database from blaDB to cardDB 

# To combine all best hits in one file:

for i in $(ls *f6.txt*_05.out); do cat $i; done >MyCombinedAMR_Results.txt