High Throughput Sequence Data Quality Control, Part. 1: FastQC

4 min readDec 18, 2022

Welcome back!

If you are new to my blog please start from here.

Now let’s get started, shall we?

We will use the same example data which we have been using from the previous articles (click here and here if you need to download the data).

To download the tool, go here and scroll down to locate FastQC.

For Windows/Linux Users:

Hover your mouse cursor over FastQC v0.11.9 (Win/Linux zip file).
Right-click and copy the link address.
Open up a fresh, new Terminal. Login to your ssh if necessary.
Go to the directory where you want to download your FastQC tool using the cd command. If you want to download the tool in a new directory, use the mkdir command to do so.
I will be downloading mine in the sra_data directory continuing from the previous demonstration.
Once you are in the designated directory, paste the following command:

wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip

Note: Make sure you have the most recent version. If not go here to find the latest version of the tool.

6. Once the download has been completed, unzip the file using this command :

unzip fastqc_v0.11.9.zip

7. Type the command ls to check if the tool has been properly unzipped. As a result, you should see a FastQC directory.

8. Go to the FastQC directory using below command:

cd FastQC

9. Enter the command below in the Terminal to configure the tool:

chmod +x fastqc

Note: Linux by default doesn’t allow direct execution so we need to use the chmod command for configuration.

10. Check whether FastQC has been properly installed by running the following command:

fastqc --version

It should show as FastQC v0.11.9 after running the above command.

11. Create a PATH environment for FastQC using the vi .profile command. If you don't remember how to set PATH, refer to the previous articles. Make sure you input the correct pathway where your FastQC tool is located.

12. To start QC use this command and replace the <name of the FASTQ files>with the appropriate file name(s):

fastqc <name of the FASTQ file>

Note: your FASTQ file name includes the .gz part as well so don’t for get to include it.

Once the process is done, it should produce a .htmland a .zipper FASTQ file. The .html files produced are the FastQC report which you need to export/download to your PC from the server.

If you are trying to download the .html files to Windows PC, click here to learn how to download files to/from the ssh server.

For Mac Users:

You don’t need the Terminal for Mac OS.

Click the FastQC v0.11.9 (Mac DMG image).
Once the download has been completed, open the program and follow the software instruction to set it up.
Since our FASTQ files are saved in the ssh server, you would need to download the file from the server to your local PC e.g. Desktop.
Open up a fresh, new Terminal screen (or tab) and do not log in to your ssh server.
Use the scp command format below and replace the sections in brackets with your information. Enter your scp command on the new Terminal:

scp -P [port number] [username]@[server name or IP]:[path to file on server] [path to file on local PC]

For example, the command for downloading the file onto the Desktop for me is:
scp -P 1234 compbio@567.890.xx.yy:/sra_data/SRR8238941_1_fastq.gz /Users/compbio/Desktop

5. Once the download is done, open the FastQC tool. Go to ‘File’ and load the files.

The QC reports for both Mac and Windows/Linux users should look like the example picture below:

Based on your QC report you will decide the Trimming process for poor-quality reads or samples.

But wait! We have a batch of FASTQ files that needs to run through FastQC.
How can we process the FASTQ files in bulk?

For Mac, since it has a GUI interface, you can open multiple tabs to load the files and run the program. For Linux/Windows there are a couple of options in which I will demonstrate in the next article on how you can run sequence data through FastQC in bulk .

Until then, try practicing what you have learned so far getting used to handling the tools and commands. See you next time!

Hi!
Thanks for reading my article, I hope you are enjoying them so far. Let me know in the comment or consider buying me a coffee, no pressure! I also publish my articles at https://shortlongseq.hashnode.dev/ and https://seqbioinformatics.substack.com/ (slower update) if you prefer a different mode of newsletter delivery.

High Throughput Sequence Data Quality Control, Part. 1: FastQC

For Windows/Linux Users:

For Mac Users:

Written by ShortLong-Seq Bioinformatics

No responses yet