New term - readCount
New term - readCount
- Submitter: Darwin Core Data Package Coordination Team
- Efficacy Justification (why is this term necessary?): see Darwin Core Data Package (DwC-DP) Implementation Experience and Feature Report
- Demand Justification (name at least two organizations that independently need this term): see Darwin Core Data Package (DwC-DP) Implementation Experience and Feature Report
- Stability Justification (what concerns are there that this might affect existing implementations?): None
- Implications for dwciri: namespace (does this change affect a dwciri term version)?: To be determined by DwC-MG
Proposed attributes of the new term:
- Term name (in lowerCamelCase for properties, UpperCamelCase for classes): readCount
- Term label (English, not normative): Read Count
- Organized in Class (e.g., Occurrence, Event, Location, Taxon): NucleotideAnalysis
- Definition of the term (normative): A number of reads for a dwc:NucleotideSequence in a dwc:NucleotideAnalysis.
- Usage comments (recommendations regarding content, etc., not normative):
- Examples (not normative):
- Refines (identifier of the broader term this term refines; normative):
- Replaces (identifier of the existing term that would be deprecated and replaced by this term; normative):
- ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG; not normative):
Suggestion from TDWG/GSC eDNA working group - the term name and description should clarify that this should be a processed read count (not raw). Hence rename it to processedReadCount? (A similar suggestion applies to totalReadCount (#649), which I will make the comment in the relevant ticket now. )
However, as a counter-note, if I understand it correctly, these read counts are (always?) related to a molecularProtocolID, eventID and crucially nucleotideSequenceID, and raw sequences will not be deposited. So, to me, this link makes it clear that the objective is to add the readCount of the sequence in question (i.e. asv1, asv2 etc) processed with a specific workflow (molecularProtocolID). For raw reads, you also generally don't have a read count until processed and clustered? (Apologies in the delay of joining the conversation!).
For totalReadCount I think it makes sense to indeed indicate the raw and processed, but this is then a property of the sample, and not the sequence (would it then make sense to have it recorded in the event table instead to avoid the repetition again?). Though I understand the purpose of having it in connection with the readCount also..