firefox-translations-training
firefox-translations-training copied to clipboard
We need filter debugging for OpusCleaner
Specially when running complicated language pairs that may not be well supported and suffer a lot from filtering (like Chinese Traditional), we need a detailed description of how much data each filter discards.
Maybe something like this tee that counts lines at the beginning and at the end of each step. Or using enabling that tee option, then count each step size.
We should save a JSON with stats, similar to how it's done in HPLT importer: https://firefoxci.taskcluster-artifacts.net/YyrMdgH-QHG75qptWf5xSQ/0/public/build/mono_v3_0.en.stats.json