Ben Hearsum (he/him) comments

Results 200 comments of


                                            Ben Hearsum (he/him)

[meta] issues blocking us from using spot instance for training tasks

I believe we're now in a place to reliably train on spot instances. Should we close this out and unlink the remaining open issues?

[meta] issues blocking us from using spot instance for training tasks

> I believe we're now in a place to reliably train on spot instances. Should we close this out and unlink the remaining open issues? I'm going to call this...

automatically upload important artifacts to a GCP bucket

The other thing we should figure out before automating this is how we organize them, and ensure that people can know what everything is without, eg: downloading and inspecting them....

automatically upload important artifacts to a GCP bucket

> Also, do we want to move the data to a more production-grade bucket from `gs://releng-translations-dev`? Yeah, I think that would be good. I'd like to upload the artifacts from...

automatically upload important artifacts to a GCP bucket

> Related to #191 > > We should upload the artifacts and logs for: > > * all the `train-` and `finetune-` steps > > * `vocab` > > *...

automatically upload important artifacts to a GCP bucket

So to spell it out a bit more, if our experiment name was `retrain`, and we were doing ru-en, we would expect the following: * `train-backwards` artifacts in `/models/ru-en/retrain/backward` *...

automatically upload important artifacts to a GCP bucket

I'm still working on downloading everything, but in the meantime, does this partial list of destinations look sensible? (The second column is where they'd end up in GCS - the...

automatically upload important artifacts to a GCP bucket

> Vocab is also needed even though we now also store it as model artifacts. And it belongs in directories like `models/en-ru/retrain1_/vocab` ? > * instead of `retrain` let's use...

automatically upload important artifacts to a GCP bucket

> * let's not forget about `quantize` and `evaluate quantized`, they also produce the model and evaluation results I did include `evaluate-quantized` - I assumed that was what ended up...

automatically upload important artifacts to a GCP bucket

> > I did include evaluate-quantized - I assumed that was what ended up in the speed directory - is that wrong? > > Yes, we call both the model...