cov-spectrum-website icon indicating copy to clipboard operation
cov-spectrum-website copied to clipboard

ENH: Submission date filter, submission delay filter and clock-filter metric

Open corneliusroemer opened this issue 1 year ago • 2 comments

Date errors are a big problem when investigating when the first sequence was found. When there are millions of Omicrons, the earliest are inevitably data entry error. See this, there are 700 Omicrons before there could have possibly been one.

image

It would be great if we could filter these erroneous date entries out somehow.

A few possibilities exist:

  • Allow submission date filtering (this would be useful for other purposes too, like putting yourself in the shoes of: what did the world in covSpectrum look like on November 25 about Omicron)
  • Allow submission delay filtering as a QC metric. The above would totally go away if we simply restricted sequences to those uploaded within 3 months of collection
  • Add clock-filter metric (this is a bit tricky for you to add, we have these in Nextstrain metadata, they don't come out of Nextclade unfortunately as we don't use data). Not too hard though, all one needs is a "date of founder" for each lineage, then estimate the real time by a standard clock, then compare this to the stated time. Maybe something to ponder over for Nextclade dataset
  • ... Maybe others have more ideas

Submission date and delay filtering is something you should be able to do fairly easily with LAPIS as is, would be great to have that, make covSpectrum even more useful ❤️

corneliusroemer avatar Oct 03 '22 11:10 corneliusroemer