Greg Tatum comments

Results 373 comments of


                                            Greg Tatum

Improve translation of short sentences

I wonder if we can load in dictionaries where it's literally one word to one word.

Improve translation of short sentences

Or maybe even synthesize it with the alignment data.

Improve translation of short sentences

This behavior is also visible with numbers. A good example is to do a list of numbers.

Improve translation of short sentences

Verify the fix with: https://bugzilla.mozilla.org/show_bug.cgi?id=1888972

Improve translation of short sentences

Here is a word count distribution for the merged corpus sl-en: https://firefox-ci-tc.services.mozilla.com/tasks/groups/PPCzZRHaTT6Ys4BIhPGT5w ![word count distribution "en"](https://github.com/user-attachments/assets/12c42a05-5fdc-4963-9b36-c887984edeae) ![word count distribution "sl"](https://github.com/user-attachments/assets/d0eb3c3b-daf3-4ed5-8e67-fd1052a06a9e) Generated via: ``` python3 pipeline/data/analyze.py --file_location https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VK5zmxJRTLy0y0WBQ0DRJg/artifacts/public/build/corpus.en.zst --output data --dataset...

Improve translation of short sentences

I filed #878 which suggests augmenting with statistically synthesized single word translations.

Improve translation of short sentences

I filed #879 which suggests harvesting short sentences from parallel datasets.

Improve translation of short sentences

I filed #880 which suggests statistically synthesizing short sentence translations from monolingual data sources.

Migrate Taskcluster UI tools to this repo

Some of the Taskcluster-specific ones I'm looking at migrating to Taskcluster, so it definitely won't be all of them. And some were just me trying messing around to learn things.

Migrate Taskcluster UI tools to this repo

The work here is to also figure out how to do the initial deploy for it and have the docs co-exist. I'm not sure how that will work yet.