Evgeny Pavlov
Evgeny Pavlov
Here's the code I used if you're interested. It utilizes all 8 GPUs (not on 100% though) and allocates 20GB of GPU memory for each GPU. There are ways to...
Why would we want to skip cleaning entirely? Like if dataset is completely clean? It should be always safe to run our default cleaning script and if it's not we...
This should be addressed as a part of #336 and #334. What I meant in the comment is that we will publish experiment results from a completely different step in...
It's not really a bug. We don't support monolingual datasets from OPUS. I think they started adding them recently. Also we have plenty of data for back-translation for English from...
It doesn't show in the UI what other languages are supported, but English to Czech looks correct. I guess something got broken for this dataset:
@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources...
We don't have a tool to regenerate the DAG picture anymore (snakemake pipeline is out of date). We need to find a new tool to generate a picture for Taskcluster...
The initial idea of this importer was to specify any URL or file path and get the dataset from there. Now we don't think much about compatibility with Snakemake and...
Another instance of this issue: https://firefox-ci-tc.services.mozilla.com/tasks/QOikHCOqSdKc3kJQYqnnBA/runs/0/logs/public/logs/live.log. We should fix it.