RNA sequencing using Next Generation Sequencers like Illumina NextSeq are making it possible for researchers to understand gene and isoform expression in a biological sample. Typically, that means using the gene expression patterns to explain a certain phenotype.
Basic RNA Seq analysis (review)
rna-seq pipeline on T-BioInfo platfrom
A typical workflow for analysis of RNA-seq data starts with quantifying gene expression, or generating a table of gene or isoform expression. The simplest workflow to prepare such a table from RNA-seq data is described in the Transcriptomics 1 course on the edu.t-bio.info portal. To review, here are the main steps: pre-processing, mapping, and quantification.
- Pre- Processing: High throughput data is often affected by preparation techniques that result in reads that contain adapter sequences, and have abnormal amplification not representative of biological conditions. Data preprocessing resolves such issues using CleanPrimer (to remove primer sequences from our data), Array To Fasta, PCR clean (to remove duplicates from the PCR run thus reducing redundancy), Trimmomatic (to remove adapter sequences), and GTF Adjust (used when GTF files contain abnormal sequences).
- Mapping on a reference genome: Short reads are aligned to a reference genome sequence. This procedure establishes the start and end positions of each read and records the information in a BAM or SAM file.
- Quantification: Abundance values for genes and isoforms are generated and recorded in an expression table. The counts of expression are in TPM (transcripts per million), FPKM (Fragments Per Kilobase of transcript per Million mapped reads – for Paired End Reads) or RPKM (Read Per Kilobase of transcript per Million mapped reads – for Single end Reads).
As a result, we have now a “table of expression” – a table that shows how genes from the reference genome are expressed across a number of samples. Now this table of expression can be further analyzed and annotated to understand how the gene expression can be used to study phenotypes.