As outlined on the Challenges when dealing with metagenomics data page, there are currently some computational challenges for many environmental researchers to interact with and glean meaningful information from the BASE or MM datasets.

This project aims to address the problem by establishing a cloud-based research data analysis system  that will circumvent the need for researchers to either download the full datasets from the BPA Data Portal or to analyse these off-line. 

These are shown in the below diagram in purple, and will be achieved by: 
  • Extending the BPA Data Portal to include search interfaces that will allow researchers to select data subsets of interest (e.g. all samples in a specific geographical area, or OTUs that correspond to either a named taxa or have significant DNA sequence similarity to a query sequence) ("1" in the figure)
  • Extending the BPA Data Portal to include interfaces that will allow researchers to specify analysis types (e.g. Multivariate Analyses) and associated parameters ("2" in the figure). 
  • Systems/methods to connect the BPA Data Portal to an established cloud-based national bioinformatics service (i.e. Galaxy Australia, part of the Genomics Virtual Laboratory (GVL) project) ("API" in the figure)
  • Extending the Galaxy Australia service to undertake the required analyses and provide outputs (e.g. data or graphs, provenance information about the analysis (e.g. parameters, software used)
  • Providing methods to allow the researchers to access the analysed data and reports ("3" in the figure).

Note that development of new tools or workflows is not in project scope, however deploying and/or connecting established tools/workflows is.

We will also:

  • Develop metagenomics training material (introductory and practical), and deliver one national training session via the EMBL-ABR hybrid virtual+physical delivery method using the system developed, and distribute the training material via EcoEd and other relevant national training portals.

Development of the high-level project scope was undertaken by Dr Andrew Bissett (CSIRO) and Anna Fitzgerald (BPA) (Data Coordinator and Program Manager of the BASE and MM framework datasets initiatives respectively) in conjunction with Dr Jeff Christiansen (QCIF Life Sciences Program Manager).