Punit Singh Koura

Results 5 issues of Punit Singh Koura

**Patch Description** Describe your changes **Testing steps** Describe how you tested your changes

cla signed

## Testing Add tests for MultiplePadDataset class

enhancement
good first issue
better-eng

1. Take a 125m pretrained checkpoint. 2. Consolidate the checkpoint using convert_to_singleton.py 3. Try loading the model behind the metaseq API. RuntimeError: Error(s) in loading state_dict for TransformerLanguageModel: Missing key(s)...

bug

Follow up for #672 - Right now it's not implemented https://github.com/facebookresearch/metaseq/blob/main/metaseq/tasks/streaming_language_modeling.py#L180

bug

## 🐛 Bug The convert_to_singleton.py script fails for the 1.3B checkpoint ### To Reproduce ``` ls 1.3b/ dict.txt gpt2-merges.txt gpt2-vocab.json reshard-model_part-0.pt reshard-model_part-1.pt ``` ``` Loading extension module fused_mix_prec_layer_norm_cuda... 2022-07-19 03:15:10...

bug