How to Download FASTA/FASTQ Files
Congrats! You have survived the dreadful part that most novices struggle at. Pat yourself on the back :)
Now let’s get down to business. We are going to download biological data from the NCBI database. We will use this, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411972/, article’s data for today’s demonstration.
- Go to NCBI website and enter the accession numbers. For this particular example, the data is deposited under BioProject.
- Once you have located the file copy the SRR ID and paste it on a text editor for later use.
- Open your terminal. From there go to the directory where you want to download the data. I will be downloading the data under
sra_data
, continuing on from the previous article, if you want to organize the downloaded data files you can create a new directory usingmkdir
command. - Execute fastq-dump using the following command format below to start the download. Paste the SSR ID we copied from Step 2.
fastq-dump -A <paste the SRR ID> --split-3 --gzip
Note: The example data is Pair end read so we use the--split-3
command to divide. If your file is Single read you do not need to use the split command. The--gzip
command compresses FASTQ file. If you are dealing with large genetic data it’s best to have the files compressed.
5. Sit back and wait till it’s done :)
Depending on the file size the download process may vary. In the next article, I will touch-up on how to download multiple FASTA/FASTQ files.