Ben Hearsum (he/him)
Ben Hearsum (he/him)
I believe we're now in a place to reliably train on spot instances. Should we close this out and unlink the remaining open issues?
> I believe we're now in a place to reliably train on spot instances. Should we close this out and unlink the remaining open issues? I'm going to call this...
The other thing we should figure out before automating this is how we organize them, and ensure that people can know what everything is without, eg: downloading and inspecting them....
> Also, do we want to move the data to a more production-grade bucket from `gs://releng-translations-dev`? Yeah, I think that would be good. I'd like to upload the artifacts from...
> Related to #191 > > We should upload the artifacts and logs for: > > * all the `train-` and `finetune-` steps > > * `vocab` > > *...
So to spell it out a bit more, if our experiment name was `retrain`, and we were doing ru-en, we would expect the following: * `train-backwards` artifacts in `/models/ru-en/retrain/backward` *...
I'm still working on downloading everything, but in the meantime, does this partial list of destinations look sensible? (The second column is where they'd end up in GCS - the...
> Vocab is also needed even though we now also store it as model artifacts. And it belongs in directories like `models/en-ru/retrain1_/vocab` ? > * instead of `retrain` let's use...
> * let's not forget about `quantize` and `evaluate quantized`, they also produce the model and evaluation results I did include `evaluate-quantized` - I assumed that was what ended up...
> > I did include evaluate-quantized - I assumed that was what ended up in the speed directory - is that wrong? > > Yes, we call both the model...