Greg Tatum

Results 204 issues of Greg Tatum

We're getting into languages that aren't in the original Flores datasets, and need to update to Flores+ https://huggingface.co/datasets/openlanguagedata/flores_plus It's gated behind a user agreement on Hugging Face, so we need...

language-coverage

It's not shippable as-is. We could just distill a base-memory and see if it's a good enough quality, but the teacher is pretty low too. It might be worth auditing...

language-coverage

Teacher: 86.14, -3.87% Student: 81.42, -9.89% https://mozilla.github.io/translations/model-registry/?searchString=Bosnian&showModels=true&score=vs-google I believe this is #681, since Bosnian is digraphic for Cyrillic and Latin. We should be able to train a `bs-en` model just...

Teacher: [87.65](https://firefox-ci-tc.services.mozilla.com/tasks/BZFB3MA4QWuK4oWEafn2vQ) -1.39% Student: [85.52](https://firefox-ci-tc.services.mozilla.com/tasks/RPJvxpa4SJSi_Ybrh0lHAg) -3.92% The simplest here would be to distill a `base-memory`. I'm wondering if we should do split vocabs as well, but we would probably have...

language-coverage

The distillation gap is quite large, and the teacher is OK. We should distill as base-memory to make it shippable. Teacher: [86.01](https://firefox-ci-tc.services.mozilla.com/tasks/c_SPjl3PTqa3OIsm5YnxDg) -2.23% Student: [82.88](https://firefox-ci-tc.services.mozilla.com/tasks/YWJnykjpSuqNpWjAW9xyqg) -6.09%

language-coverage

`no` is a macro language tag for `nn` and `nb`. `no-en` is technically a multilingual model, while `en-no` I went ahead and trained as I'm guessing it's similar to `en->pt`...

language-coverage

In the Spring 2024 run, [we trained a teacher](https://wandb.ai/moz-translations/en-vi/workspace?nw=nwusergtatum), but the student was never distilled because the COMET score was a bit low, 87.52 -2.59%. The training curves for the...

language-coverage

For some reason when training `en-af` I had some infrastructure issues around getting the downloads. I don't know if it was some bug in my corpus continuation code or if...

language-coverage

This is currently blocked because Tagalog and Filipino both have distinct language codes (tl, and fil respectively). However, for our initial models I don't think we will need to distinguish...

language-coverage

I just hit this once, and I haven't tested if this is an intermittent issue or not. ``` old_e = 3, n_items = 856819, dynamic = 1 eflomal: eflomal.c:394: text_alignment_sample:...

bug