seamless_communication
seamless_communication copied to clipboard
S2S aligned metadata "extension" is a subset of prior metadata release?
The metadata files in docs/m4t/seamless_align_README.md come in several dated revisions. From what I've checked of enA-ptA
and enA-esA
at least, it seems like the "extension" from Nov 30 is a pure subset of the earlier metadata published on Sep 25. Is it possible to double-check if that's the case, and whether maybe some other extension dataset was intended to be published instead?
To verify:
> zcat seamless.dataset.metadata.public.enA-ptA.withduration.tsv.gz | sort -u | wc
5257334
> zcat seamless.dataset.metadata.public.enA-ptA.withduration.tsv.gz seamless.dataset.metadata.public.enA-ptA.tsv.gz | sort -u | wc
5257334