RNA-Seq vs. Microarray Technology

Introduction

On the Illumina web site there is a page titled Advantages of RNA-Seq technology with a section on Benefits of RNA-Seq vs. Microarray Technology. The benefits are described as falling in the following categories:

On this page, I have extracted the main passages and figures from the freely available references cited by Illumina on that page. The motivation behind examining these references is to get an idea of the sort of evidence Illumina points to in support to its platform performance claims. Although these published articles provide some examples of experimental design and analyses aimed at platform comparison and evaluation, they should not be regarded as guidelines for best practice.


Cited References

Wang et al. (2009) [1]

Abstract

RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Paper Highlights

Micro Arrays:

Hybridization-based approaches are high throughput and relatively inexpensive, except for high-resolution tiling arrays that interrogate large genomes. However, these methods have several limitations, which include:

  • reliance upon existing knowledge about genome sequence;

  • high background levels owing to cross-hybridization;

  • a limited dynamic range of detection owing to both background and saturation of signals.
    Moreover, comparing expression levels across different experiments is often difficult and can require complicated normalization methods.

RNA-Seq technology and benefits:

  • RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence.

    • RNA-Seq can reveal the precise location of transcription boundaries, to a single-base resolution.
    • 30-bp short reads from RNA-Seq give information about how two exons are connected.
    • longer reads or pair- end short reads should reveal connectivity between multiple exons.
  • RNA-Seq has very low, if any, background signal

  • RNA-Seq does not have an upper limit for quantification

Quantifying expression levels: RNA-Seq and microarray compared Expression levels are shown, as measured by RNA-Seq and tiling arrays, for Saccharomyces cerevisiae cells grown in nutrient-rich media. The two methods agree fairly well for genes with medium levels of expression (middle), but correlation is very low for genes with either low or high expression levels.

The scatterplots depicted above illustrate the difference in dynamic range between the two platforms. One problem with this depiction is that it makes use of different genes to illustrate the range of expression being captured, but the interest in practice is in the range of expression of a given gene between samples.

Paper Summary

This paper clearly articulates potential advantages of RNA-Seq over micro-array technologies. In some cases the advantages are clear as RNA-Seq enables analyses which cannot be done with microarrays. In other cases, like the effect of background and dynamic range on sensitivity and limits of detection, the advantages are that RNA-Seq brings over micro-array technologies are less clear and require empirical demonstration. The article fails to provide any.

Zhao et al. (2014) [3]

Abstract

To demonstrate the benefits of RNA-Seq over microarray in transcriptome profiling, both RNA-Seq and microarray analyses were performed on RNA samples from a human T cell activation experiment. In contrast to other reports, our analyses focused on the difference, rather than similarity, between RNA-Seq and microarray technologies in transcriptome profiling. A comparison of data sets derived from RNA-Seq and Affymetrix platforms using the same set of samples showed a high correlation between gene expression profiles generated by the two platforms. However, it also demonstrated that

  • RNA-Seq was superior in detecting low abundance transcripts

  • differentiating biologically critical isoforms

  • allowing the identification of genetic variants

  • RNA-Seq also demonstrated a broader dynamic range than microarray, which allowed for the detection of more differentially expressed genes with higher fold-change.

Analysis of the two datasets also showed the benefit derived from avoidance of technical issues inherent to microarray probe performance such as cross-hybridization, non-specific hybridization and limited detection range of individual probes. Because RNA-Seq does not rely on a pre- designed complement sequence detection probe, it is devoid of issues associated with probe redundancy and annotation, which simplified interpretation of the data. Despite the superior benefits of RNA-Seq, microarrays are still the more common choice of researchers when conducting transcriptional profiling experiments. This is likely because RNA-Seq sequencing technology is new to most researchers, more expensive than microarray, data storage is more challenging and analysis is more complex. We expect that once these barriers are overcome, the RNA-Seq platform will become the predominant tool for transcriptome analysis.

Paper Highlights

Correlations - Microarray Data

Figure S1

Figure S1

Correlations - RNA-Seq Data

Figure S2

Figure S2

The correlation coefficients between biological replicates range from 0.995 to 0.997 in microarray (see Figure S1), and the associated p-values with sample size of 18,306 genes are 0 (less than 1e-300). The corresponding correlation coefficients are 0.997 to 0.998 in RNA-Seq (Figure S2). Note that the correlation was calculated using log2 transformed expression values. For those genes with low expression levels, variability is higher in RNA-Seq. Clearly, RNA-Seq has a better correlation than microarray, as shown in Figure S1 and Figure S2.

Microarray vs RNA-Seq - Expression

Figure 2

Figure 2

Scatter plots show the averages (between biological duplicates) of log2 transformed expression values between two platforms, at each individual time point. The relationship between the expression profiles generated in both platforms is depicted as either a smoothing spline (black) or a linear regression line (red). The intercept (b) and the slope (m) of the linear regression line, and the correlation coefficient (r) are reported at the top-left corner in each plot corresponding to each time point. The plots show that the overall dynamic range of the 18,306 common genes generated by the two platforms is much broader in RNA-Seq (2.6×105) than in microarray (3.6×103). Similar dynamic ranges are displayed in both platforms for genes with relative expression level between 0.55 and 0.95. In each platform, the relative expression level of each gene was determined based on the average of log2 transformed expression values in all 12 samples.

The overall dynamic range was much broader in RNA-Seq (\(2.6×10^5\)) than that in microarray (\(3.6×10^3\)), especially at both the lower (with relative expression level less than 0.55) and the upper (with relative expression level greater than 0.95) ends. Note the relative expression level of each gene in the last plot in Figure 2 was determined based on the average of log2 transformed expression values in all 12 samples. A relative expression level of 0.5 represents an underlying expression value in the middle of the range for all expression values. The vertical lines in the last plot of Figure 2 indicate the relative expression levels at 0.30, 0.55 and 0.95 respectively. About 30% of expression values generated by the RNA-Seq platform were wither zero or below the floored level (0.047 RPKM). A broader dynamic range was observed in RNA-Seq compared to microarray at both ends, i.e. with relative expression level either less than 0.55 or greater than 0.95. A similar dynamic range was displayed in both platforms for genes with relative expression level between 0.55 and 0.95. Due to background hybridization or noise, all genes had an expression value in microarray, regardless of whether it was truly expressed or not.

Microarray vs RNA-Seq - Differential Expression

Figure 3

Figure 3

Between-group (BG) and within-group (WG) variances are presented as the square root of the mean difference for each gene, and are plotted against the relative expression levels on both platforms, microarray (MA, panel A) and RNA-Seq (RS, panel B). The averages of variances (in square root, panel C) for between- and within-groups are plotted (panel C). The F-scores of 18,306 common genes are compared between two platforms (panel D), and the averages of F-scores (in square root, panel E) are also presented along with the relative expression levels by using smoothing spline for both platforms. The distributions of FDR-adjusted p-values, based on F-scores in both platforms are presented (panel F).

The variances, both between and within treatment group at all six time points, in log2 transformed expression values of the 18,306 common genes were analyzed by one-way ANOVA for both platforms (Figure 3). Within-group variances reflect data reproducibility, while between-group variances represent the sensitivity of platform to detect differential gene expression in response to T cell activation. For most genes, the between-group variances were larger than the within-group variances in both platforms (Figure 3, A–C), which is consistent with the expectation that many genes should be differentially expressed during the process of T cell activation. For genes with relatively low expression (relative expression level <0.47), within-group variances were higher in RNA-Seq than in microarray, representing lower reproducibility between the biological replicates. For genes with relatively high expression, within-group variances were lower in RNA-Seq, representing higher reproducibility (Figure 3C). The between-group variances exhibited similar patterns for genes with high expression in both platforms, whereas higher variances were observed in RNA-Seq for those genes with low expression (Figure 3C).

The capability to detect differential gene expression in both platforms was evaluated as an F-score generated by one-way ANOVA. Similar differential gene expression profiles were obtained in both platforms, as illustrated by high correlation coefficient (r = 0.718) between two sets of log-transformed F-scores (Figure 3D). 75.5% of genes exhibited higher F-scores in RNA-Seq, as compared to microarray. Positive correlations between F-scores and relative expression levels were observed in both platforms, indicating greater power in the detection of differential expression for genes with higher expression levels (Figure 3E). Using the F-score based, False Discovery Rate (FDR) adjusted P-value of 0.05, as a cut off, microarray and RNA-Seq selected 56.0% and 71.5% of genes, respectively, as differentially expressed among the six time points (Figure 3F).

Figure 4

Figure 4

Scatter plots of log2 transformed ratios (vs. baseline at T = 0 h) between both platforms at selected time points (T = 2, 6, and 72 hour) show similar results are observed at T = 4 and 24 hour. Genes that are specifically differentially expressed in microarray or RNA-Seq are colored in red and green respectively, and genes that are differentially expressed in both platforms are colored in blue.

Differential gene expression profiles of 18,306 common genes at each of the five time points following Th17 activation were compared between the two platforms (Figure 4, and data not shown). Similar differential gene expression profiles were obtained in both platforms at each time point, as illustrated by high correlation coefficient (r = 0.78–0.80) between the two sets of log-transformed ratios. However, the magnitude of differential expression was greater in RNA-Seq than in microarray, as indicated by the slopes (m = 1.18–1.27) at each time point. The distribution of differentially expressed genes, either platform specific or common to both platforms, was independent of their expression levels (Figure 4). Also, while a large number of genes were identified as differentially expressed in both platforms (colored in blue in Figure 4), there were still a number of genes specifically detected as differentially expressed in only one platform (colored in red and green, respectively, in Figure 4). There are several reasons for platform-dependent measurement of differentially expressed genes. First, as shown in Figure 2, the differences in expression profiles for some genes were apparent between the two platforms, and accordingly, different fold changes are calculated and reported. Second, for genes with very low or very high expression levels, RNA-Seq is more likely to detect the changes at two different conditions, as we will demonstrate later. Third, a microarray probe might hit some, but not all, isoforms of a gene, and as a result the reported fold change of the probe set does not necessarily represent the expression change of the entire gene. Probe set 205277_PM_at is a case in point, which we will discuss in the Discussion section, as well as all of the inherit biases of microarray and RNA-Seq in detection of differential expression.

More plots included in the report are not discussed here.

Paper Summary

Although this paper contains a lot more empirical findings than the previous one, and the dataset analyzed provides an opportunity to detect differntial expression, the data set lacks baseline truth that could be used to score the performance of the two platforms in accurately identifying the differentially expressed regions. The are some exceptions when truth is known, MYCL1 and ACTB for example, but these exceptional cases do not provide a means to accurately score the platorms in terms of performance.

Wang et al. (2014) [4]

Abstract

RNA-seq facilitates unbiased genome-wide gene-expression profiling. However, its concordance with the well-established microarray platform must be rigorously assessed for confident uses in clinical and regulatory application. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same set of liver samples of rats under varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOA). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is highly correlated with treatment effect size, gene-expression abundance and the biological complexity of the MOA. RNA-seq outperforms microarray (90% versus 76%) in DEG verification by quantitative PCR and the main gain is its improved accuracy for low expressed genes. Nonetheless, predictive classifiers derived from both platforms performed similarly. Therefore, the endpoint studied and its biological complexity, transcript abundance, and intended application are important factors in transcriptomic research and for decision-making.

The FDA launched the community-wide MicroArray Quality Control (MAQC) consortium to investigate the reliability and utility of microarrays in identifying differentially expressed genes (DEGs) and predicting patient/toxicity outcomes based on gene-expression data in the first (MAQC-I) and second (MAQC-II) phases of the project, respectively.

High-throughput sequencing technologies provide new methods for whole-transcriptome analyses of gene expression. Recently published studies have compared data obtained from microarrays and RNA-seq in terms of technical reproducibility, variance structure, absolute expression and detection of DEGs or gene isoforms

In contrast to data generated as part of the SEQC project using reference RNA samples25, our study design provides a comparison of the transcription response for rat livers that each platform detects in terms of extensive chemical treatments, biologic replication and breath of shared mode of action (MOA) of the chemicals beyond simply monitoring performance metrics.

Specifically, we report the results of a comparative analysis of gene expression responses profiled by Affymetrix microarray and Illumina RNA-seq in liver tissue from rats exposed to diverse chemicals. We used either microarray or RNA-seq data to generate DEGs and predictive models of MOA of each chemical. This allowed us to assess the influence of the chemical (referred hereafter as the ‘treatment effect’) on the concordance between RNA-seq and microarrays and on the performance of predictive models generated using each technology. Treatment effect is characterized by the number of DEGs and the over-expressed pathways underlying MOA of the chemical.

We found that (i) the concordance between array and sequencing platforms for detecting the number of DEGs was positively correlated with the extensive perturbation elicited by the treatment, (ii) RNA-seq performed better than microarrays at detecting weakly expressed genes, and (iii) gene expression–based predictive models generated from RNA-seq and microarray data were similar. The experimental design also allowed us to identify positive correlations in differentially expressed RNA elements (mRNA, splice variants, non-coding RNA and exon-exon junction) with the extensive perturbation elicited by the treatment, and to examine treatment-induced alternative splicing and shortening of 3’ untranslated regions (UTRs).

Paper Highlights

Study Design

Figure 1

Figure 1

Overview of study design. (a) The study was comprised of a training set and a test set with the text on the left detailing the experimental design and the text on the right listing the key analyses conducted. Both microarray and RNA-seq were used to profile transcriptional responses induced by treatment of rats by each chemical; each is known to associate with a specific mode of action (MOA). For each MOA there were three representative chemicals and three biological replicates per chemical. We evaluated cross-platform concordance at multiple levels: differentially expressed genes, mechanistic pathways and MOAs. To compare the predictive potential of RNA-seq and microarray as gene-expression biomarkers, we profiled 4 MOAs by both platforms as a test set, two of the MOAs (PPARA and CAR/PXR) appear in the training set whereas the other two do not. (b) Overview of RNA-seq analysis pipelines.

Treatment effect dictates concordance

Figure 2

Figure 2

Concordance between RNA-seq and microarray. (a) Between-platform concordance of DEGs against the number of DEGs identified by RNA-seq. Concordance is the overlap of the DEGs from the two platforms with agreement in the direction of fold change (Online Methods). The concordance was adjusted to remove the contribution of random chance. (b) Between-platform concordance of pathways against the number of DEGs identified by RNA-seq. The concordance was adjusted to remove the contribution of random chance. Two sets of pathways were analyzed, one obtained from the DEGs derived from the common set based analysis and the other was based on the platform-independent analysis. (c) Common pathways shared by three chemicals with the same MOA (x axis). For each MOA, the left bar shows the number of pathways found using microarray data; the right bar is for RNA-seq data. (d) Concordance between pathways identified by both platforms for each MOA.

Low abundance contributes to discrepancies

Figure 3

Figure 3

Transcript abundance-dependent concordance between RNA-seq and microarray. (a) Root mean squared distance (RMSD in y-axis) between pairs of rats for each chemical and averaged over all the chemicals by a bin of genes which contains 10% of the genes from the high expression end (left side of the x-axis of each bin) to the low end (right side of the x-axis of each bin). The analysis was performed on RNA-seq with six pipelines and the microarray with two normalization methods (RMA and MAS5). (b, c) For each chemical, the x-axis represents the number of DEGs top ranked by the fold change with p<0.05 for both platforms with equal numbers of up- and down-regulation. The y-axis represents the overlap (%) between platforms for a given number of ranked DEGs. Each line on the graph represents the overlap of DEG lists between two platforms for one chemical for above-median expressed genes (b) and below-median expressed genes (c).

Figure 4

Figure 4

Both platforms exhibited a high concordance to qPCR in DEGs (94% for RNA-seq and 88% for microarray) for the above-median expressed genes (Fig. 4a). However, for the below-median expressed genes, although the high concordance remained for RNA-seq (89%; 8 out of 9 DEGs verified), it was drastically lower for microarrays (17%; 1 out of 6 DEGs verified). We also compared the fold change of qPCR versus RNA-seq and qPCR versus microarray for the DEGs verified by qPCR. The RNA-seq data were correlated more with qPCR data (R2 = 0.97) than with microarray data (R2 = 0.85) (Fig. 4b). Furthermore, the regression slope for RNA-seq is close to 1, whereas the slope for microarray deviates from 1 and is tilted towards the x-axis, showing the classic ratio compression phenomenon32. The analysis for each chemical separately yielded similar results (Supplementary Fig. 5)

Figure 4 Concordance of RNA-seq and microarray data with qPCR data. (a) The qualitative agreement (differentially expressed or not) of RNA-seq and microarray against qPCR for the DEGs with above-median expression or below-median expression is shown along with the overall average results. Differential expression was determined by fold change >1.5 and P-value <0.05. (b) Correlation of RNA-seq and microarray (y-axis) with qPCR data (x-axis) using the log2 fold change measure of the genes differentially expressed P-value across the two gene-expression platforms under correlation analysis.

Prediction of samples by mode of action is similar

Figure 5

Figure 5

Cross-platform comparisons of classifiers. Comparison of MOA prediction results between two platforms. Sixty-one classifiers were generated using the training set and subsequently predicted the test set chemicals in a blind fashion. (a) Each box plot displays the range and distribution of prediction accuracy achieved by a platform for a particular chemical group (i.e., chemicals grouped by PPARA, CAR/PXR, unknown to the training set, and Overall). The horizontal line is the median, the top of the box is the upper quartile, the bottom of the box is the lower quartile and the circles are outliers. (b) Comparison of prediction accuracy at individual chemical level grouped by MOAs. (c) Comparison of prediction accuracy at the animal level. The x-axis lists all animals grouped by treatment chemical and then MOA.

Paper Summary

This paper reports on a huge study with many comparisons between RNA-Seq and micro array in terms of meaningful results - the identification of differntially expressed genes and the classfication of samples. Additionally, qPCR was used add sensitivity to the comparative evaluation (see above).

Marioni et al. (2008) [7]

Abstract

Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

Paper Highlights

Comparison of results across technologies

As a first step to comparing the sequence and array data, we compared the number of sequence reads mapped to each gene with the corresponding (normalized) absolute intensities from the array (Fig. 3). Reassuringly, these two independent measures of transcript abundance are highly correlated (Spearman correlation = 0.73 for liver, 0.75 for kidney). Interestingly, where results from the two technologies differ, it is generally where the array intensities are large and the sequence counts small; a pattern that might be explained by probe-specific background hybridization on the array.

Figure 3

Figure 3

Comparing counts from Illumina sequencing with normalized intensities from the array, for kidney (left) and liver (right). In each panel, the average (log2) counts for each gene are plotted on the X-axis, and the corresponding normalized intensities from the array are shown on the Y-axis. To avoid taking the log of 0, we added 1 to each of the average counts prior to taking logs.

differentially expressed genes

We next compared differentially expressed genes called from the Illumina sequencing data with those identified from the array. By applying a widely used empirical Bayes approach (Smyth 2004) to the array data, we identified 8113 genes as differentially expressed at an FDR of 0.1% (83% with an estimated absolute log2-fold change > 0.5, 43% > 1). Of these, 81% of genes were also identified as differentially expressed from the Illumina sequencing data, providing strong evidence that the majority of genes called from the sequence data are genuinely differentially expressed between the two samples. Furthermore, estimates of the log2-fold changes of gene expression levels between the samples across the two technologies are correlated (Spearman correlation = 0.73) (Fig. 4). The correlation is greater for genes that are mapped to by large numbers of sequence reads. For example, for genes mapped to by (on average) more than 32 reads in both tissues (≥5 on the log scale in Fig. 3), the Spearman correlation of the fold changes across technologies is 0.79 compared with 0.60 for genes mapped to by at least one but fewer than 32 reads. These comparisons with the array data demonstrate that the Illumina sequencing technology and our analysis approach are performing well. A complete comparison of genewise results from both technologies is available in Supplemental Table 3.

Figure 4

Figure 4

Comparison of estimated log2 fold changes (liver/kidney) from Illumina (Y-axis) and Affymetrix (X-axis). We consider only genes that were interrogated using both platforms and genes where the mean number of counts across lanes was greater than 0 for both the liver and kidney samples. (Red and green dots) Genes called as differentially expressed based on the Illumina sequencing data at an FDR of 0.1%, with a mean number of counts greater than (red) or less than (green) 250 reads in both tissues. (Black dots) Genes not called as differentially expressed based on the Illumina sequencing data. The set of differentially expressed genes that show the strongest correlation between the two technologies seems to be those that are mapped to by many reads (red), while the correlation is weaker for differentially expressed genes mapped to by fewer reads (green).

Considered together, 6538 genes were identified as differentially expressed using either the sequencing or the array data but not by both (Fig. 5). To further examine these discrepancies, we used a third technology, quantitative PCR (qPCR), to test for differences in expression between the liver and kidney samples for five genes called differentially expressed from the sequence data but not the array (MMP25, SLC5A1, MDK, ZNF570, GPR64) and for six genes that were found to be differentially expressed using the array, but not the sequencing data (C16orf68, CD38, LSM7, S100P, PEX11A, GLOD5). We designed primers for the qPCR within 1 kb upstream of the annotated 3′-end of the genes (Methods). The qPCR results confirmed as differentially expressed (t-test, P < 0.01) four of the first set of genes (all but ZNF570), but only two of the second set (CD38 and GLOD5). Thus, overall, the qPCR results agreed more closely with the Illumina sequencing results than with the array.

Paper Summary

This paper includes some comparisons between RNA-Seq and microarrays based on differential expression analysis results. Two tissue types are used to provide the differentially expressed genes - Kidney and Liver. One problem is that although it is known that many genes will be differentially expressed, the full identity of these genes is not known. For a small set of genes qPCR was used to arbitrate cases when the two platforms disagreed on DE calls. This is a useful additoinal piece of information but the number of targets that can be investigated by qPCR is limited.

Conclusions

References

1. Wang, Z., Gerstein, M., and Snyder, M. RNA-seq: A revolutionary tool for transcriptomics. Nature reviews. Genetics 10, 57–63. Available at: https://pubmed.ncbi.nlm.nih.gov/19015660.

2. Wilhelm, B.T., and Landry, J.-R. (2009). RNA-seq-quantitative measurement of expression through massively parallel rna-sequencing. Methods (San Diego, Calif.) 48, 249–257. Available at: http://europepmc.org/abstract/MED/19336255.

3. Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K., and Liu, X. Comparison of rna-seq and microarray in transcriptome profiling of activated t cells. PloS one 9, e78644–e78644. Available at: https://pubmed.ncbi.nlm.nih.gov/24454679.

4. Wang, C., Gong, B., Bushel, P.R., Thierry-Mieg, J., Thierry-Mieg, D., Xu, J., Fang, H., Hong, H., Shen, J., and Su, Z. et al. The concordance between rna-seq and microarray data depends on chemical treatment and transcript abundance. Nature biotechnology 32, 926–932. Available at: https://pubmed.ncbi.nlm.nih.gov/25150839.

5. Li, J., Hou, R., Niu, X., Liu, R., Wang, Q., Wang, C., Li, X., Hao, Z., Yin, G., and Zhang, K. (2016). Comparison of microarray and rna-seq analysis of mRNA expression in dermal mesenchymal stem cells. Biotechnology Letters 38, 33–41. Available at: https://doi.org/10.1007/s10529-015-1963-5.

6. Liu, Y., Morley, M., Brandimarto, J., Hannenhalli, S., Hu, Y., Ashley, E.A., Tang, W.H.W., Moravec, C.S., Margulies, K.B., and Cappola, T.P. et al. RNA-seq identifies novel myocardial gene expression signatures of heart failure. Genomics 105, 83–89. Available at: https://pubmed.ncbi.nlm.nih.gov/25528681.

7. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and Gilad, Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome research 18, 1509–1517. Available at: https://pubmed.ncbi.nlm.nih.gov/18550803.