firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

We need filter debugging for OpusCleaner

Open ZJaume opened this issue 5 months ago • 2 comments

Specially when running complicated language pairs that may not be well supported and suffer a lot from filtering (like Chinese Traditional), we need a detailed description of how much data each filter discards.

ZJaume avatar Nov 27 '25 11:11 ZJaume

Maybe something like this tee that counts lines at the beginning and at the end of each step. Or using enabling that tee option, then count each step size.

ZJaume avatar Nov 27 '25 11:11 ZJaume

We should save a JSON with stats, similar to how it's done in HPLT importer: https://firefoxci.taskcluster-artifacts.net/YyrMdgH-QHG75qptWf5xSQ/0/public/build/mono_v3_0.en.stats.json

evgenyrp avatar Nov 27 '25 18:11 evgenyrp