Phyloseq and Microbiome analysis in R

Data Import and Exploration

There are a number of packages developed in R that make microbiome analysis easy and produce great figures. There is at times a lot of overlap between this analysis and QIIME - and the choice is yours.

Personally I think that R allows alot more freedom and is more aesthetically pleasing, plus there are lots of things that cannot be done in QIIME that can be done in R. Having said that it does require some knowledge of R, and however this notebook tries to step you through what you need to know.

Two packages in particular are useful for microbiome analysis - the microbiome package builds on phyloseq, and you may find you don’t need the microbiome package.

As always there is more than one way to do things, this phyloseq import will follow the basic import tutorial, in other words importing files individually to create a phyloseq object. However, you may like to follow their tutorials on the QIIME, or mothur input. I do find that as these programs change over time there can be a delay in updating the latest data import workflow, hence why I have gone with the metheod of phyloseq-ize Data already in R that will work no matter what updates are made to QIIME, usearch or other pipelines you may use.

Libraries

Note: You will need to install the library first if you haven’t already using install.packages

Load the libraries

library("phyloseq")
library("tidyverse")
library("ape")
library("plyr")
library("readr")
library("microbiome")
library("lattice")
library("colorspace")
library("RColorBrewer")
library("vegan")
library("microbiome")
library("jsonlite")

Remove variables in current environment

rm(list=ls())

Set working directory

setwd("~/tick_meta_analysis/data")

Remember the useful package for data management in R ProjectTemplate

Importing data and creating a physeq object

Note: The phyloseq package is very sensitive to file formats and way in which your input files are laid out, so please take the time to read this section carefully.

Next you can set your working directory and import your OTU table (from the usearch pipeline) and your taxonomy table (from QIIME). I recommend you copy and data the files from your Usearch/QIIME directory output to the data file in the R management directory.

Import OTU table taking note of the following:

  • Rows = taxa (otus) and columns = samples. The sample names should already be matched to your metadata if you following the QIIME analysis to generate a feature table correctly.
  • Order the OTU column in ascending order to do this you may need to delete the letters ‘OTU’, which can be done in excel with a control-H command.
  • The first column with the OTU number must then be deleted, this can get confusing so it is best to take this document as a copy so that you do not save over the table with the OTU numbers.
  • Ensure that the column headings (sample names) are in the same order as what appears in the metadata sheet. In excel you may like to copy and paste the data using the transpose option to order the sample column before reverting it back.
  • Save as csv file
uparse_otus <- read_csv("data/otu_table.csv")
otumat <- as.matrix(uparse_otus)
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat

Import the taxonomy table taking note of the following:

  • Rows = taxa and columns = taxa heirachy
  • The first column with the OTU number must then be deleted, this can get confusing so it is best to take this document as a copy so that you do not save over the table with the OTU numbers.
  • Save as csv file
taxonomy <- read_csv("data/taxonomy.csv")
taxmat <- as.matrix(taxonomy)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
taxmat
Note format of the taxonomy table varies depending on which reference database you used. For example in the Greengenes classifer the output taxonomy table formats each taxa as followed: p__Taxa where the letter prefix refers to the heirachy of the taxonomy assignment. You may want to remove this formatting in excel before inputting for aesthetic purposes of the graph outputs. You can do this by using the Control-H command in excel.

Next we can check the class of the otumat and taxmat objects, they MUST be in matrix format. Then we can great a phyloseq object called physeq from the otu and taxonomy tables and check the sample names.

class(otumat)
class(taxmat)
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
TAX
physeq = phyloseq(OTU, TAX)
physeq
sample_names(physeq)

Metadata Import

The order of the rows in the metadata must match the order of the columns in the OTU table, the number of rows(taxa) and sample names must also be a match. Ensure the metadata file is in csv format.

meta_data <- read_csv("data/metadata.csv")
sampledata = sample_data(data.frame(
 meta_data, row.names=sample_names(physeq), stringsAsFactors=FALSE))
sampledata

Tree Import

There are a few options for the tree import- you can create a random tree using the ape package, however as this only using taxonomy information it is not as robust as other methods that use the sequence information, such as that generated by QIIME or USEARCH.

random_tree = rtree(ntaxa(physeq), rooted=TRUE, tip.label=taxa_names(physeq))
plot(random_tree)

To import your USEARCH tree

MyTree <- read.tree("data/otus.tree")

To import your QIIME2 tree

MyTree2 <- read.tree("rooted-tree.nwk")

Create final phyloseq object

Now you can merge your data to create a final phyloseq object

physeq1 = merge_phyloseq(physeq, sampledata, MyTree)
physeq1

You should end up with the following output after physeq1

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 15985 taxa and 1390 samples ]
sample_data() Sample Data:       [ 1390 samples by 13 sample variables ]
tax_table()   Taxonomy Table:    [ 15985 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 15985 tips and 15984 internal nodes ]

Basic commands to check your physeq object

nsamples(physeq1)       # number of samples
ntaxa(physeq1)          # number of taxa (OTUs)
sample_names(physeq1)[1:5]  # sample names - first five
rank_names(physeq1)   # taxonomic ranks e.g. Phylum, Kingdom
sample_variables(physeq1) # metadata variables
otu_table(physeq1)[1:5, 1:5] # first 5 otus (no. of reads) and first 5 samples
tax_table(physeq1)[1:5, 1:4] # first 5 samples and taxonomy first 4 columns 
myTaxa = names(sort(taxa_sums(physeq1), decreasing = TRUE)[1:10])



Vector and Waterborne Pathogens Research Group. 2018. S Egan.