preprocess
preprocess copied to clipboard
Fail downloading Seamless align data
when i follow https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/seamless_align_README.md, try to download the dataset, use
zcat seamless.dataset.metadata.public.arb-enA.tsv.gz | egrep ^crawl-data | tr '\t' ' ' | build/bin/wet_lines
raise Error:
and no wav is saved; BTW, this script cost a lot of time to process something, but i cant find anything download in my workspace, is there any possible method to save each wav or text during the hole processing stage? Thx a lot.
I try again but still get same error, and save nothing, cost almost 2 days
what(): /home/ubuntu/preprocess/preprocess/wet_lines_main.cc:71 in void Retrieve::Add(util::StringPiece, const Extract&) threw util::Exception because !extracts.empty() && extracts.back().paragraph_num ber > extract.paragraph_number'. Metadata should be sorted by paragraph number in each document