Stephen Roller
Stephen Roller
The nightly gpu tests still (mostly) use only one gpu. We have full support for multiprocessing eval, and we should be taking advantage of it. My suggestion is to add...
The `compute_loss` function in TGA is very useful, and a few people have needed it. An example showing how to use this would be quite nice. Additionally, it should show...
We've had a few contributors eager to make PRs, but had trouble figuring out. Including docs explaining how to create a PR would probably increase contributions.
**Bug description** In distributed mode, the `find_unused_parameters` option to DataParallel is currently unable to be overwritten by a user. We should make a helper method in TorchAgent for that, which...
**Bug description** Right now the GPT-2 agent always says "num words = 4". We need to fix this. Probably need to delay that print statement until later, or move it...
We currently don't have any integration tests to ensure that TensorBoard integration is working. One small one (via a `train_model` example) should be stood up. Related to #1820.
**Bug description** Created from #2003. The `self.rank_loss` in `bi_encode_ranker` (and possibly the others) should be created by overriding the `build_criterion` rather than directionly setting self.rank_loss. https://github.com/facebookresearch/ParlAI/blob/028753ff0cbc293037117f1b0bb07eb7e7335093/parlai/agents/bert_ranker/bi_encoder_ranker.py#L57
In TorchAgent and seq2seq, we have some gross logic that tries to sort a batch so that the RNN is happy to accept it. Previously in older torch versions, something...
**Bug description** Though we have a few MP tests already, we have had a few bugs crop up now, indicating to me that we're not exhaustively testing. cc @emilydinan @jxmsML
Add a flag to use tied positional embeddings in transformer/generator and transformer/retrieval, and implement the tied weights. Should be False by default for backwards compatibility, and upgrade_opt should be used...