The internet has an infinite number of resources available - if you can google than you can code!

Don’t forget about youtube - an excellent resources for those that are visual learners.

Below are just some of the resources available, mainly tailored around genomics, however the Data and Software Carpentry resources are an excellent starting point for general information.

Where to start?

It can be a bit daunting knowing where to start. There are so many programs, packages, applications, tutorials - it’s important to remember you cannot do them all!

My personal advice (remember I am mac/linux user) is to start my learning how to do commandline (i.e. terminal) and know some basics of the bash language. Once you can perform some basic commands then move on to learning R/RStudio.

My reason for recommending RStudio is it will save you lots of time in the long term. The RStudio community is huge and everything is free so you never have to worry about no longer having access. There is a wealth of packages for RStudio you can do everything, the whole workflow - data entry, manipulating and summarising spreadsheets, statistical analysis, data visualisation and then writing your report/thesis/publication etc. It will save you long term as you won’t have to learn so many additional programs (i.e. mapping programs, photoshop/figure editor software, statistics, the list goes on…). Most often people will pick and choose to do certain elements in RStudio and then use programs they are familar with for the other bits….the good news is you can take it at your own pace.

Siobhon’s recommendations for beginners

Software Carpentry
- The Unix Shell - REQUIRED (this is an essential start for beginners!)
- Version Control with Git - this provides a platform for you to keep track and share your code. I suggest you come back to this later desktop version
- Programming with Python - come back to python, this is another scripting language. A popular one among many coders, but often for bioinformatics most programs use bash (the one you learnt in ‘The Unix Shell’ tutorial)
- Programming with R - HIGHLY RECOMMENDED
- R for reproducible scientific analysis - recommended extra, although you can probably skip some lessons

Even if you think your field is not ecology, trust me this is an excellent place to start learning RStudio.

Data Carpentry
- Ecology Lessons
  - Data organisation in spreadsheets - HIGHLY RECOMMENDED
  - Data analysis and visualization in R - REQUIRED

While you maybe tempted to look at the Genomics lessons (of course go ahead if you have a burning desire!) they are still very much a work in progress, and I would not recommend them at this stage.

R/RStudio

In addition to the above software carpentry and data carpentry resources some additional places to start include:

Swirl is a great interactive tutorials that run directly in R. Useful for statistical analysis and basic functions.
R Tutorial offers here a couple of introductory tutorials on basic R concepts.
Code School provides some more tutorials on basic R syntax and basics.
Quick-R is great for tutorials on statistical analysis.
STAT 457 course website for the 2019-2020 edition of STAT 545A and STAT 547M, colloquially known as just “STAT 545”, delivered at The University of British Columbia in Vancouver, BC.

R Cheat Sheets There are a number of cheatsheets and other reference documenation available for R.

A link to the offical R cheat sheets is here.
Other useful cheatsheets by DataCamp are available here.
Links to other useful resources availbale here, here, here and here

Sequence & Phylogenetics

More coming soon - see my personal page here for some inspiration in the mean time.

NGS Sequence analysis

This pipeline makes use of USEARCH for preprocessing and QIIME2 for taxonomy assignment and analysis - both of these come with a number of different tutorials with example data for you to work through. I strongly suggest you work through these critically don’t just copy and paste and press eneter (even thought it is tempting!) - make sure you understand what and why you are doing things.

Other comparable alternatives

QIIME2 - although this webpage uses QIIME only from the taxonomy step onwards, pre-processing steps can be done in QIIME2 make using of various Plugins
Mothur - a lot of information and resources available
VSEARCH - made as an alternative to USEARCH
FastQC - a common tool used for basic fastq commands and manipulation

Some other useful references and analysis tools for genomics

RNAseq analysis - a link to similar content of RNAseq by combine Australia is also available here
Next Gen Seek
Bits of DNA
RNA-Seq Blog
Journal of Next Generation Sequencing & Applications
CoreGenomics
Next-Gen Sequencing
Omics! Omics!
In Between Lines of Code
Kevin’s GATTACA World
Blog @ Illumina
Next Generation Technologist
Applied compuational genomics
OSSU bioinformatics

Microbial analysis

Microbial analysis has revieved the most attention with respect to amplicon based NGS approaches; however with some refining these pipelines can also be used for other amplicon based NGS studies (e.g. COI (metazoa), ITS (fungi), TrnL (plant), 18S (protozoa) just to name a few). Plus there is of course overlap with other genomics tools as well. There can be alot of overlap in these so its up to you to see what works for you. Generally those in R allow more customisation however may be more difficult to grasp initially. GUI’s are good to get a quick handle on your data, however you are usually limited in the customisation of the graphical outputs from these programs. Some of these are more of a stand alone package however the majority will utilise a common format (e.g. phyloseq objects are common throughout many of these).

phyloseq - Analyze microbiome census data using R
microbiomeSeq - An R package for microbial community analysis in an environmental context
metacoder - An R package for metabarcoding research planning and analysis
microbiome R - Microbiome R package (extending phyloseq)
microbiomeutilities - Extending and supporting package based on microbiome and phyloseq (currently in developmental stage)
R microbiota - Microbiota analysis in R
MicrobiobeMiseq - Analyses of microbial community composition and diversity in R using phyloseq
Bioconductor Microbiome - Microbiome data analysis: from raw reads to community analysis
Ampvis2 - Tools for visualising amplicon data
dada2 - Fast and accurate sample inference from amplicon data with single-nucleotide resolution
mare - Pipeline for microbiota analysis based on 16S-amplicon reads
metagenomeSeq - Statistical analysis for sparse high-throughput sequencing
Rhea - A set of R scripts for the analysis of microbial profiles
taxize - A taxonomic toolbelt for R
LabDSV - Ordination and multivariate analysis for ecology
phylogeo - An R package for geographic analysis and visualization of microbiome data
qiimer - R functions to read QIIME output files and create figures
RAM - R for amplicon-sequencing-based microbial-ecology

RNASeq

…

Databases

TriTrypDB - kinetoplastid genomics resource
VectorBase - bioinformatics resource for invertebrate vectors of human pathogens
CryptoDB - Cryptosporidium genomics resource
Silva - high quality ribosomal RNA databases. SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
Greengenes - 16S rRNA gene database of Bacteria and Archaea.
EuPathDB - Eukaryotic pathogen database resource
unite - communication and identification of DNA based fungal species (based on internal transcribed spacer gene - ITS)
BoldSystems - The Barcode of Life Data System is designed to support the generation and application of DNA barcode data. Includes the following: Animal identification using mitochondrial mitochondrial cytochrome oxidase subunit 1 (COI); Fungal using internal transcribed spacer (ITS); and Plant using chloroplast ribulose-bisphosphate carboxylase (RbcL) & plastid/nuclear Maturase K (Matk)
VEuPathDB - This NIH Bioinformatics Resource Center (BRC) will support the integration of parasite resources currently provided by EuPathDB.org, fungal resources provided by FungiDB.org, and vector resources provided by VectorBase.org.

Epidemiology and Statistics

Web/online resources

ClinEpiDB - Advancing global public health by facilitating the exploration and analysis of epidemiological studies. ClinEpiDB, launched in February 2018, is an open-access online resource enabling investigators to maximize the utility and reach of their data and to make optimal use of data released by others. More coming soon
Epitool - This site is developed and maintained by Ausvet. The site is intended for use by epidemiologists and researchers involved in estimating disease prevalence or demonstrating freedom from disease through structured surveys, or in other epidemiological applications.
VassarStats - A useful and user-friendly tool for performing statistical computation (probabilities, regression, t-test, ANOVA and more)

RMarkdown

More coming soon

Writing your thesis

…

Making a webpage

…

Mapping