How to Download FASTA/FASTQ Files

ShortLong-Seq Bioinformatics
2 min readDec 14, 2022

--

Congrats! You have survived the dreadful part that most novices struggle at. Pat yourself on the back :)

Now let’s get down to business. We are going to download biological data from the NCBI database. We will use this, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411972/, article’s data for today’s demonstration.

  1. Go to NCBI website and enter the accession numbers. For this particular example, the data is deposited under BioProject.
  2. Once you have located the file copy the SRR ID and paste it on a text editor for later use.
  3. Open your terminal. From there go to the directory where you want to download the data. I will be downloading the data under sra_data, continuing on from the previous article, if you want to organize the downloaded data files you can create a new directory using mkdir command.
  4. Execute fastq-dump using the following command format below to start the download. Paste the SSR ID we copied from Step 2.
    fastq-dump -A <paste the SRR ID> --split-3 --gzip

Note: The example data is Pair end read so we use the--split-3 command to divide. If your file is Single read you do not need to use the split command. The--gzip command compresses FASTQ file. If you are dealing with large genetic data it’s best to have the files compressed.

5. Sit back and wait till it’s done :)

Depending on the file size the download process may vary. In the next article, I will touch-up on how to download multiple FASTA/FASTQ files.

--

--

ShortLong-Seq Bioinformatics

Bioinformatics | Systems Biology | Computational Biology| Data Science | Hiker | Foodie | Tango Dancer | https://www.buymeacoffee.com/shortlongseq