Stephen Roller

Results 117 comments of Stephen Roller

If you have time (I don't immediately), can you trace through with model parallel and non-model parallel and see where things diverge?

Yeah a couple things are messing with you: - eval_model "forgets" the batch size. So all your things are running with a batchsize of 1. Based on "gpu_mem", it looks...

(Also `--beam-delay` only does anything if `--inference delayedbeam` is set)

ALSO, I have this WIP PR that's very close but just needs some testing: https://github.com/facebookresearch/ParlAI/pull/2775. TLDR is that eval_model only uses one GPU, and the new PR fixes this.

Closing this, but lemme know if you have further questions Sam. Cheers!

No, it will always use only 1 gpu for evaluation. If you use that PR, then you can use `multiprocessing_eval` with otherwise identical arguments and it will split the data...

Ah, a few tricks to speed up training: - There is `parlai.scripts.multiprocessing_train`. It behaves just like I described `multiprocessing_eval` above. Simply switch from calling `python -m parlai.scripts.train_model` to `python -m...

Oh, and `--eval-batchsize` is also an option, to pump up the batchsize during validation, since you don't need activations/gradients.