Greg Tatum
Greg Tatum
This a meta bug so I don't think it needs to be assigned to me.
Shouldn't CI catch this?
It would be nice to make it consistent with a `.stats.json` for every step, and then finally showing all of this information in W&B. For instance in bicleaner.sh: ```diff diff...
We'll need to be careful we don't open the pipeline up to arbitrary python execution. I noticed a few warnings while navigating huggingfaces about this, but I haven't looked into...
I would like to have more samples of data in the artifacts just in general. It would be nice to have statistic about corpus size, how much was filtered, samples...
This is probably blocked on #417.
I'm not a taskcluster expert, and maybe others can chime in here. This has information on the taskgraph that is generated: https://taskcluster-taskgraph.readthedocs.io/en/latest/ If you run the `utils/preflight_check.py`, it will generate...
(copying over my thoughts from #315). For reference, this is the definition of a [preemtible instance](https://cloud.google.com/compute/docs/instances/preemptible). During the Catalan run the teacher training would often take 2 or 3 times...
The next step here is to load in the previous artifacts and restart the training.
Oh wait, the tests aren't passing. We should investigate that before merging.