SERVICES
Diversity Assessment
High-throughput sequencing and taxonomic classification are essential for microbiome research projects. Our complete 16S microbiome analysis includes 16S sequencing, taxonomic classification of organisms, sample diversity statistics, and the option to assess the beta diversity between groups of samples (e.g. populations, treatments, sites). Project-specific analyses are available upon request.
We currently offer amplicon based 16S sequencing for V1-2, V3, V4, and V6. A custom mix of variable regions can be ordered as well.
Ingenio NGS Differential Expression Analysis Report
Mouse Run Report
Report generated on November 10, 2021
Submitted Samples
Sample names and analysis groups were taken directly from the submission form. The table was created with the kableExtra R package[1]
Number of Samples Submitted: 10
Run Report
Sequencer Used: Illumina MiniSeq
NGS Sequencing Method: Paired-End Amplicon Sequencing
Bacterial Region Sequenced: 16S (V3, V4, and V6)
Amplicon Size: 160-300bp
Mock Communities Used: BEI Mock Even and Mock Staggered Communities for 16S
Total Reads: 7,871,350
Total Paired Reads: 3,935,675
Minimum Paired Read Count: 237,641
Maximum Paired Read Count: 410,966
Average Read Size: 128 bases
Read Counts Per Sample
Raw paired-end reads from each sample are reported in the following bar graph and corresponding table. All samples must pass this threshold for successful reporting of low abundant bacterial organisms. The submitted samples are labeled with the user-provided IDs from the submission form. Submitted samples are listed alphabetically, and duplicates are denoted with an underscore and numbered sequentially. The bar graph was generated with the ggplot2 and plotly R packages[2, 3], and the table was created with the kableExtra R package[1]
Read Counts After Filtering
Paired-end reads were filtered with Fastp[4] and new FastQ files were generated with the passing reads. To ensure proper alignments and classification, reads with ambigous bases in the forward or reverse reads were removed from the read count. Additionally, reads with less than 50 bases in either direction were filtered out. The resulting read counts after Fastp filtering are shown in the bar graph and table below. Raw read counts are shown as a blue bar and read counts following Fastp filtering are shown as a yellow bar on the graph. The percentages of passing paired-reads after Fastp filtering are reported in the table. The bar graph was generated with the ggplot2 and plotly R packages[2, 3], and the table was created with the kableExtra R package[1]
Read Counts After Mothur Filtering
Read alignment and taxonomic classification were done using the Mothur pipeline for 16S data[5]. Paired-end reads were used to generate a Fasta file that was used for 16S analysis. The Fasta sequences were aligned to the reference 16S V4 region obtained from the SILVA database[6–8]. Aligned sequences were then classified using the RDP database[9]. Classified reads were filtered to remove sequences outside of the 16S region, sequences with non-bacterial classification, and poor quality sequences. The remaining read counts are reported in the following graph and table. The percentages of passing reads after Mothur filtering are listed in the table. The bar graph was generated with the ggplot2 and plotly R packages[2, 3], and the table was created with the kableExtra R package[1]
Group Diversity
The sequences were divide into organizational taxonomic units (OTUs) based on their taxonomic classifications. Group diversity (beta diversity) metrics assess the similarity and dissimilarity of OTU composition between groups of samples. Due to varying read counts between samples, all read counts were normalized to match the sample with the lowest read count, via subsampling without replacement, prior to statistical analysis. Diversity heatmaps and bar charts were generated using the total read count per sample.
Principal Component Analysis
Principal Component Analysis (PCA) was done using normalized read counts as described above. Groups were predefined on the sample submission form, and colored accordingly. Mock samples were also included for this analysis and separated as an independent group. A PCA was done to reduce the dimensions of the final OTU table, and the first two principal components (PC1 and PC2) are plotted below using the plotly R package[3].
Phylum Level OTU Heatmap
Samples were clustered based on phylum level OTUs and the heatmap is shown below. OTU read counts were log10 .transformed and infinite values were set to zero. The sample and OTU dendrograms were ordered using optimal leaf ordering (OLO) from the heatmaply R package[10] and the heatmap was visualized using the plotly R package[3]. Values are shown as log10(total reads) and infinite values were set to zero.
AMOVA
A Statistics section will only be included if there is more than one group for the project.
The Analysis of Molecular Variance (AMOVA) is used to detect the variance of molecular markers between two groups. The groups used for these analyses have been predefined on the sample submission form. Normalized read counts from genus level OTUs were used for this section. Mothur uses an asterisk to denote significance (p-Value < 0.01). All the tables in this section were generated using the kableExtra R package[1].
Bar Charts
Visualization of group diversity and taxonomic classification with bar charts. The Mock Communities were assessed for accurate classification and compared to the expected read counts from the BEI data sheet. The phylum level composition and the genus level composition are provided for all submitted samples. All bar graphs were generated using the ggplot2 and plotly R packages[2, 3].
V3 Composition
V4 Composition
V6 Composition
Sample Diversity
Sample Diversity (alpha diversity) summarizes the composition of the microbial community within a sample using measurements for its richness (number of taxonomic groups) and/or evenness (distribution of abundances of the groups). The calculations for Sample Diversity were done using normalized read counts as described in the Group Diversity section.
Rarefaction Curves
Rarefaction and rarefaction curves are used to measure OTU richness within a sample. The number of OTUs are plotted against the number of sequences and the resulting curves suggest the richness of each sample. Sequences were subsampled 100 times to generate the rarefaction curves below. Rarefaction curves typically flatten after a steep incline for samples that have been sufficiently sequenced. Low diversity samples will reach their peak and flatten out much earlier. Very high diversity samples or samples that have not been sequenced completely will continue to rise without flattening.
Diversity Index
Sample diversity can also be assessed using the Simpson’s Diversity Index and the Shannon’s Diversity Index. Simpson’s Diversity Index measures the probability that any two individuals drawn at random from a community belong to different species/OTUs (dominance and richness). Shannon’s Diversity Index describes the diversity and species/OTU richness of a sample in one metric. All the tables in this section were generated using the kableExtra R package[1].
Taxonomic Classification
Sunburst Plots
The sunburst plots for the groups and individual samples were generated using the plotly R package[3].
V3 Classification
V4 Classification
V6 Classification
Phylum Level Pie Charts
Pie charts showing the phylum level composition for the groups defined on the submission sheet as well as individual samples. These plots were generated using the plotly R package[3].
V3 Pie Charts
V4 Pie Charts
V6 Pie Charts
Differential OTUs
Differntial OTU tables were generated using LEfSe. The LDA Effect Size (LEfSe) is an algorithm for High-Dimensional microbiome biomarker discovery. LEfSe uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers of groups[11].
Classification Table
Tables with read counts and full taxonomic breakdowns. Regions are separated into tabs and the link for the csv can be found at the bottom of each table. These tables were created using the reactable R package[12].
References
- Zhu H. kableExtra: Construct complex table with ’kable’ and pipe syntax. 2021. https://CRAN.R-project.org/package=kableExtra.
- Wickham H. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York; 2016. https://ggplot2.tidyverse.org.
- Sievert C. Interactive web-based data visualization with r, plotly, and shiny. Chapman; Hall/CRC; 2020. https://plotly-r.com.
- Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90. doi:10.1093/bioinformatics/bty560.
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology. 2009;75:7537–41. doi:10.1128/AEM.01541-09.
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research. 2012;41:D590–6. doi:10.1093/nar/gks1219.
- Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Research. 2013;42:D643–8. doi:10.1093/nar/gkt1209.
- Glöckner FO, Yilmaz P, Quast C, Gerken J, Beccati A, Ciuprina A, et al. 25 years of serving the community with ribosomal RNA gene reference databases and tools. Journal of Biotechnology. 2017;261:169–76. doi:https://doi.org/10.1016/j.jbiotec.2017.06.1198.
- Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Research. 2013;42:D633–42. doi:10.1093/nar/gkt1244.
- Galili, Tal, O’Callaghan, Alan, Sidi, Jonathan, et al. Heatmaply: An r package for creating interactive cluster heatmaps for online publishing. Bioinformatics. 2017. doi:10.1093/bioinformatics/btx657.
- Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12:R60.
- Lin G. Reactable: Interactive data tables based on ’react table’. 2020. https://CRAN.R-project.org/package=reactable.
Sample Submissions Guidelines
DNA for 16S Sequencing
- Extraction via a kit designed for microbiome analysis (e.g. ZymoBIOMICS DNA Kit)
- Optical Density (260/280): 1.8-2.0
- No RNA contamination
- Suspended in DNase-free water or such buffers as 10mM Tris, Qiagen EB, or TE
- At least 10ul of DNA at a concentration of > 5 ng/ul
- Shipped with dry ice
Stool and Soil Samples for 16S Sequencing
- Flash freeze as soon as collected
- A minimum of 0.5g of stool or soil
- Stored at -80C prior to shipping
- Shipped with dry ice
- Alternative: collect samples with Zymo DNA/RNA Shield and ship at ambient temperature