Chapter 4 Normalization methods
Different assays require different methods.
4.1 Library-size scaling
The simplest approach divides each sample by its total counts (or reads) to account for sequencing depth.
4.1.1 Counts-per-million (CPM)
There are other versions of this type of normalization (RPKM, FPKM ,TPM,…) - all of them try to normalize for the library size (the number of reads in a given experiment) but some of them incorporate other factors as well.
CPM - normalizes for just library size.
TPM - normalzie for gene length, then library size.
FPKM/RPKM - similiar but one uses reads and the other fragments. normalizes for gene length and library size at the same time. TPM is preferred.
4.2 Distribution-based normalization
Methods like quantile normalization force sample distributions to match and are common in microarrays.
4.3 RNA-seq composition-aware methods
Two widely used methods:
- TMM (edgeR): computes scaling factors to adjust effective library sizes.
- RLE / size factors (DESeq2): median-of-ratios approach.
4.3.1 edgeR: TMM
TMM looks at most genes, ignores the weird ones, figures out how different two samples really are, and uses that to scale the samples so they’re comparable.