As outlined on the Challenges when dealing with metagenomics data page, there are currently some computational challenges for many environmental researchers to interact with and glean meaningful information from the BASE or MM datasets.
- Extending the BPA Data Portal to include search interfaces that will allow researchers to select data subsets of interest (e.g. all samples in a specific geographical area, or OTUs that correspond to either a named taxa or have significant DNA sequence similarity to a query sequence) ("1" in the figure)
- Extending the BPA Data Portal to include interfaces that will allow researchers to specify analysis types (e.g. Multivariate Analyses) and associated parameters ("2" in the figure).
- Systems/methods to connect the BPA Data Portal to an established cloud-based national bioinformatics service (i.e. Galaxy Australia, part of the Genomics Virtual Laboratory (GVL) project) ("API" in the figure)
- Extending the Galaxy Australia service to undertake the required analyses and provide outputs (e.g. data or graphs, provenance information about the analysis (e.g. parameters, software used)
- Providing methods to allow the researchers to access the analysed data and reports ("3" in the figure).
Note that development of new tools or workflows is not in project scope, however deploying and/or connecting established tools/workflows is.
We will also:
- Improve aspects of both the BPA Data Portal and GalaxyAustralia/GVL to better align to the FAIR data principles (making data and data-related tools/services more Findable, Accessible, Interoperable and Reusable); and
- Develop metagenomics training material (introductory and practical), and deliver one national training session via the EMBL-ABR hybrid virtual+physical delivery method using the system developed, and distribute the training material via EcoEd and other relevant national training portals.
Development of the high-level project scope was undertaken by Dr Andrew Bissett (CSIRO) and Anna Fitzgerald (BPA) (Data Coordinator and Program Manager of the BASE and MM framework datasets initiatives respectively) in conjunction with Dr Jeff Christiansen (QCIF Life Sciences Program Manager).