High Throughput Sequencing Data Pipeline, Part. 1: Quality Control

ShortLong-Seq Bioinformatics
2 min readDec 16, 2022

--

Good job!
By now you should be able to download FASTA/FASTQ files with ease. The next series of write-ups will be about high throughput sequence data analysis pipe line and the tools involved for analysis.

Picture source: “Read Mapping.” Biocorecrg.Github.Io, 2022.

Once reads (ATCG’s) have been sequenced using Illumina sequencer, they are stored in FASTQ format as raw data.
The next step is called Quality Control (QC). The purpose of QC is to check the quality of the reads making sure that there are no severe abnormalities in the samples and running some statistical analysis on them (filtered reads, % GC, aligned reads, etc).

If you want to learn more about this concept, you can click on the image above to read more about it and here (video lecture) too.

There are many tools one can use to perform QC: FastQC, FastQScreen, FASTX etc. Depending on what your lab uses or what is available the tools may differ, but the basic schematics are the same.

In the next article, I will be using FastQC to demonstrate how you can perform quality control so stick around! :)

Thanks for reading my article, I hope you are enjoying them so far. Let me know in the comment below or consider buying me a coffee (no pressure)! I also publish my articles at https://shortlongseq.hashnode.dev/ and https://seqbioinformatics.substack.com/ if you prefer a different mode of newsletter delivery, they are free.

--

--

ShortLong-Seq Bioinformatics
ShortLong-Seq Bioinformatics

Written by ShortLong-Seq Bioinformatics

Bioinformatics | Systems Biology | Computational Biology| Data Science |

No responses yet