OpusCleaner
OpusCleaner copied to clipboard
ValueError: large.bin has wrong file format!
FastText model downloading fails quite often, especially when using the "large" model.
A workaround is to pre-download the model with wget:
filters_dir="/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/filters"
wget -O "${filters_dir}/large.bin" https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
I think requests.get
is not robust enough without retries, so it just fails periodically and wget does a lot more to ensure reliable downloading.