dwc New term - readCount

New term - readCount

Submitter: Darwin Core Data Package Coordination Team
Efficacy Justification (why is this term necessary?): see Darwin Core Data Package (DwC-DP) Implementation Experience and Feature Report
Demand Justification (name at least two organizations that independently need this term): see Darwin Core Data Package (DwC-DP) Implementation Experience and Feature Report
Stability Justification (what concerns are there that this might affect existing implementations?): None
Implications for dwciri: namespace (does this change affect a dwciri term version)?: To be determined by DwC-MG

Proposed attributes of the new term:

Term name (in lowerCamelCase for properties, UpperCamelCase for classes): readCount
Term label (English, not normative): Read Count
Organized in Class (e.g., Occurrence, Event, Location, Taxon): NucleotideAnalysis
Definition of the term (normative): A number of reads for a dwc:NucleotideSequence in a dwc:NucleotideAnalysis.
Usage comments (recommendations regarding content, etc., not normative):
Examples (not normative):
Refines (identifier of the broader term this term refines; normative):
Replaces (identifier of the existing term that would be deprecated and replaced by this term; normative):
ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG; not normative):

Sep 11 '25 12:09 tucotuco

Suggestion from TDWG/GSC eDNA working group - the term name and description should clarify that this should be a processed read count (not raw). Hence rename it to processedReadCount? (A similar suggestion applies to totalReadCount (#649), which I will make the comment in the relevant ticket now. )

Nov 25 '25 14:11 miwa582

However, as a counter-note, if I understand it correctly, these read counts are (always?) related to a molecularProtocolID, eventID and crucially nucleotideSequenceID, and raw sequences will not be deposited. So, to me, this link makes it clear that the objective is to add the readCount of the sequence in question (i.e. asv1, asv2 etc) processed with a specific workflow (molecularProtocolID). For raw reads, you also generally don't have a read count until processed and clustered? (Apologies in the delay of joining the conversation!).

For totalReadCount I think it makes sense to indeed indicate the raw and processed, but this is then a property of the sample, and not the sequence (would it then make sense to have it recorded in the event table instead to avoid the repetition again?). Though I understand the purpose of having it in connection with the readCount also..

Dec 03 '25 10:12 SSuominen1