seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

S2S aligned metadata "extension" is a subset of prior metadata release?

Open arlofaria-cartesia opened this issue 8 months ago • 0 comments

The metadata files in docs/m4t/seamless_align_README.md come in several dated revisions. From what I've checked of enA-ptA and enA-esA at least, it seems like the "extension" from Nov 30 is a pure subset of the earlier metadata published on Sep 25. Is it possible to double-check if that's the case, and whether maybe some other extension dataset was intended to be published instead?

To verify:

> zcat seamless.dataset.metadata.public.enA-ptA.withduration.tsv.gz | sort -u | wc
5257334
> zcat seamless.dataset.metadata.public.enA-ptA.withduration.tsv.gz seamless.dataset.metadata.public.enA-ptA.tsv.gz | sort -u | wc
5257334

arlofaria-cartesia avatar Jun 07 '24 23:06 arlofaria-cartesia