bio_data_guide icon indicating copy to clipboard operation
bio_data_guide copied to clipboard

[dataset]: Bodega Bay 2018 Benthic eDNA

Open hollybik opened this issue 4 months ago • 7 comments

Contact details

[email protected]

Dataset Title

Bodega Bay 2018 Benthic eDNA

Describe your dataset and any specific challenges or blockers you have or anticipate.

We have a 2018 eDNA study carried out in Bodega Bay, CA that is ready for publication. We are looking for training and guidance in how to deposit our eDNA / species observations into OBIS and Darwin Core. We are especially interested in how to do this kind of data deposition for future studies (e.g. developing lab protocols and automated bioinformatics scripts for formatting), as well as how to link eDNA occurrences with taxonomically validated species IDs (e.g. that derive from parallel microscopy and DNA barcoding that we often carry out alongside eDNA sequencing in the lab).

Info about "raw" Data Files.

We have eDNA ASV tables with associated taxonomic assignments derived from BLASTing eDNA representative sequences against reference databases (e.g. SILVA). We also have station and site metadata in a separate mapping file.

hollybik avatar Aug 24 '25 21:08 hollybik

@lukenoaa 👀

MathewBiddle avatar Aug 25 '25 12:08 MathewBiddle

Hi @hollybik, we should be able to help. At AOML, we've been developing a workflow for publishing eDNA data. Essentially it boils down to three steps:

  1. (optional) Put your protocols (sampling, extraction, PCR, sequencing) in BeBOP format. This is optional, but it provides you with values for the FAIR eDNA terms that you'll use in the next step.
  2. Put your metadata and ASV/taxonomy tables in FAIR eDNA format using FAIRe2ODE. This code will generate a Google Sheet that puts all your data in a format that is used by edna2obis (next step) and the Ocean DNA Explorer (ODE). Both tools are under development; edna2obis is more mature and is the one you'll use to publish your data.
  3. Process your data through edna2obis. This will generate the files you need to submit your data to OBIS and GBIF.

I'm copying @ksil-noaa and @baydenwillms who have done a lot of the development of these tools. We are planning a walk-through of these steps in a single document, but most of the information you need is in the respective GitHub repositories.

lukenoaa avatar Aug 25 '25 18:08 lukenoaa

Hi @hollybik ! Luke did a good job summarizing the steps to use our workflow. The only thing I'd want to add there is that the format for the ASV tables and taxonomy info will be found here at the edna2obis readme. It is probably pretty similar to how your files are setup. A benefit to using edna2obis is that it allows multiple markers to be in the same dataset, and helps match your taxonomic names to either the GBIF or OBIS backbone.

I also wanted to point out that a more user-friendly, GUI-centric option is to use the GBIF MDT tool. This is great for single marker studies, and where you have cleaned up the taxonomic names pretty well to align with GBIF or WORMS taxonomy backbones.

For this point ("as well as how to link eDNA occurrences with taxonomically validated species IDs (e.g. that derive from parallel microscopy and DNA barcoding that we often carry out alongside eDNA sequencing in the lab)", that is going to be a lot easier with the new DwC data package which is being evaluated by the community now. So stay tuned there!

ksil-NOAA avatar Aug 26 '25 14:08 ksil-NOAA

Thank you @lukenoaa @ksil-NOAA for the detailed advice and links! I'm so glad this community is proactive in moving these toolkits forward, it is very much needed.

Since I signed up for the November Marine Biodiversity Data Mobilization workshop, should our lab carry out these steps in advance of the workshop, or is this something you'll walk us through on Zoom? Just want to be clear how much advance data wrangling would be useful before the workshop, since it seems like there are many existing tools for (re)formatting our eDNA datasets that come out of our typical metabarcoding analysis pipeline.

hollybik avatar Aug 26 '25 20:08 hollybik

Hi Holly- I'm not an organizer for this event but I have been a past organizer. I thought I would respond to your question based on my previous organizer experience :-) I think it's helpful to try to carry out the steps in advance of the workshop. That way you already know the places where you get stuck and can ask for specific help with getting those blockers resolved during the workshop. When I was working on standardizing data I would frequently run into blockers that I wasn't sure how to solve. You can email someone or post in a slack and maybe someone gets back to you quickly but maybe it takes them a week or two and then you've moved on to something else and don't quite remember exactly where you left off. The workshop is an opportunity to get direct assistance right then and there with your blockers so you can reach the end point of published data more quickly. I hope that helps!

AbbyBenson avatar Aug 29 '25 22:08 AbbyBenson

As an organizer of this year's (and previous years) workshop, I 100% agree with @AbbyBenson's comments above. Try to work through as much as you can pre-workshop and make a note of the blockers you've run into. Then, during the workshop we can help you address any questions or blockers you've run into. It will make that time much more effective if you come with questions/prepped data.

This is also a good space to document your progress (this GitHub issue, that is). It looks like all of the right people are commenting here, so you seem to be in great hands!

MathewBiddle avatar Sep 02 '25 12:09 MathewBiddle

@hollybik I echo what Matt and Abby said. If you come up with questions/new blockers on your dataset before the workshop feel free to share them here. It's possible we can help you get through some of them via GitHub and tackle the more complicated ones during the workshop.

sformel avatar Sep 08 '25 01:09 sformel