Data Import and Exploration
There are a number of packages developed in R that make microbiome analysis easy and produce great figures. There is at times a lot of overlap between this analysis and QIIME - and the choice is yours.
Personally I think that R allows alot more freedom and is more aesthetically pleasing, plus there are lots of things that cannot be done in QIIME that can be done in R. Having said that it does require some knowledge of R, and however this notebook tries to step you through what you need to know.
Two packages in particular are useful for microbiome analysis - the microbiome package builds on phyloseq, and you may find you don’t need the microbiome package.
As always there is more than one way to do things, this phyloseq import will follow the basic import tutorial, in other words importing files individually to create a phyloseq object. However, you may like to follow their tutorials on the QIIME, or mothur input. I do find that as these programs change over time there can be a delay in updating the latest data import workflow, hence why I have gone with the metheod of phyloseq-ize Data already in R that will work no matter what updates are made to QIIME, usearch or other pipelines you may use.
Libraries
Note: You will need to install the library first if you haven’t already using install.packages
Load the libraries
library("phyloseq")
library("tidyverse")
library("ape")
library("plyr")
library("readr")
library("microbiome")
library("lattice")
library("colorspace")
library("RColorBrewer")
library("vegan")
library("microbiome")
library("jsonlite")
Remove variables in current environment
rm(list=ls())
Set working directory
setwd("~/tick_meta_analysis/data")
Remember the useful package for data management in R ProjectTemplate
Next you can set your working directory and import your OTU table (from the usearch pipeline) and your taxonomy table (from QIIME). I recommend you copy and data the files from your Usearch/QIIME directory output to the data file in the R management directory.
Import OTU table taking note of the following:
uparse_otus <- read_csv("data/otu_table.csv")
otumat <- as.matrix(uparse_otus)
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat
Import the taxonomy table taking note of the following:
taxonomy <- read_csv("data/taxonomy.csv")
taxmat <- as.matrix(taxonomy)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
taxmat
p__Taxa
where the letter prefix refers to the heirachy of the taxonomy assignment. You may want to remove this formatting in excel before inputting for aesthetic purposes of the graph outputs. You can do this by using the Control-H
command in excel.
Next we can check the class of the otumat and taxmat objects, they MUST be in matrix format. Then we can great a phyloseq object called physeq from the otu and taxonomy tables and check the sample names.
class(otumat)
class(taxmat)
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
TAX
physeq = phyloseq(OTU, TAX)
physeq
sample_names(physeq)
Metadata Import
The order of the rows in the metadata must match the order of the columns in the OTU table, the number of rows(taxa) and sample names must also be a match. Ensure the metadata file is in csv
format.
meta_data <- read_csv("data/metadata.csv")
sampledata = sample_data(data.frame(
meta_data, row.names=sample_names(physeq), stringsAsFactors=FALSE))
sampledata
Tree Import
There are a few options for the tree import- you can create a random tree using the ape package, however as this only using taxonomy information it is not as robust as other methods that use the sequence information, such as that generated by QIIME or USEARCH.
random_tree = rtree(ntaxa(physeq), rooted=TRUE, tip.label=taxa_names(physeq))
plot(random_tree)
To import your USEARCH tree
MyTree <- read.tree("data/otus.tree")
To import your QIIME2 tree
MyTree2 <- read.tree("rooted-tree.nwk")
Create final phyloseq object
Now you can merge your data to create a final phyloseq object
physeq1 = merge_phyloseq(physeq, sampledata, MyTree)
physeq1
You should end up with the following output after physeq1
phyloseq-class experiment-level object
otu_table() OTU Table: [ 15985 taxa and 1390 samples ]
sample_data() Sample Data: [ 1390 samples by 13 sample variables ]
tax_table() Taxonomy Table: [ 15985 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 15985 tips and 15984 internal nodes ]
nsamples(physeq1) # number of samples
ntaxa(physeq1) # number of taxa (OTUs)
sample_names(physeq1)[1:5] # sample names - first five
rank_names(physeq1) # taxonomic ranks e.g. Phylum, Kingdom
sample_variables(physeq1) # metadata variables
otu_table(physeq1)[1:5, 1:5] # first 5 otus (no. of reads) and first 5 samples
tax_table(physeq1)[1:5, 1:4] # first 5 samples and taxonomy first 4 columns
myTaxa = names(sort(taxa_sums(physeq1), decreasing = TRUE)[1:10])
Vector and Waterborne Pathogens Research Group. 2018. S Egan.