firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

[meta] Train harder to segment languages, like CJK languages

Open gregtatum opened this issue 1 year ago • 1 comments

For harder to segment languages we have Chinese, Japanese, and Korean. We'll need to implement better tokenization support and segmentation support for these languages in order to train them. This work should happen after training a subset of the easier to segment language in #524.

### Perform basic training
- [ ] https://github.com/mozilla/firefox-translations-training/issues/740
- [ ] https://github.com/mozilla/firefox-translations-training/issues/76
- [ ] https://github.com/mozilla/firefox-translations-training/issues/752
- [ ] #424
- [ ] https://github.com/mozilla/firefox-translations-training/issues/745
- [ ] https://github.com/mozilla/firefox-translations-training/issues/747
- [ ] https://github.com/mozilla/firefox-translations-training/issues/746
- [ ] #45
- [x] Train a basic teacher model
### Implement advanced features
- [ ] https://github.com/mozilla/firefox-translations-training/issues/741
- [ ] https://github.com/mozilla/firefox-translations-training/issues/743
- [ ] https://github.com/mozilla/firefox-translations-training/issues/744
- [ ] https://github.com/mozilla/firefox-translations-training/issues/742
- [ ] https://github.com/mozilla/firefox-translations-training/issues/749
- [ ] https://github.com/mozilla/firefox-translations-training/issues/750
- [ ] https://github.com/mozilla/firefox-translations-training/issues/751
- [ ] https://github.com/mozilla/firefox-translations-training/issues/753
- [ ] https://github.com/mozilla/firefox-translations-training/issues/748
- [ ] Train a good quantized model
- [ ] https://github.com/mozilla/firefox-translations-training/issues/860
- [ ] https://github.com/mozilla/firefox-translations-training/issues/896
- [ ] https://github.com/mozilla/firefox-translations-training/issues/899
### Run production training
- [ ] Train Chinese
- [ ] Train Japanese
- [ ] Train Korean

Native Speakers

If you are a native speaker (L1 language) in any of these languages and want to help out, feel free to leave a comment on this issue or join us in Firefox Translations on matrix. We can always use help with qualitative model evaluation, and questions regarding language.

gregtatum avatar Feb 06 '24 17:02 gregtatum