metaseq
metaseq copied to clipboard
Unify model-parallel vs non model-parallel codepaths
As https://github.com/facebookresearch/metaseq/issues/383 flagged, right now we have two separate codepaths for model-parallel vs non model-parallel code (aka https://github.com/facebookresearch/metaseq/blob/main/metaseq/model_parallel/models/transformer_lm.py vs https://github.com/facebookresearch/metaseq/blob/main/metaseq/models/transformer_lm.py as one example).
This should all be unified.
I looked into this and it was trickier than I expected. We have to make a fair amount of changes to megatron.
hey @suchenzang, I looked at the task. The model_parallel code path uses some of the models&modules from the metaseq.models path. So there are two options:
- move everything from model_parallel path to metaseq.models path and remove model_parallel path
- move everything to metaseq.model_parallel path and remove models&modules path.
I think, the option 1 is probably better - flattens the structure. In any case there would be quite some code refactoring. + we can probably remove all the _encoder files as well.
Context: Currently we use the following models from metaseq.models in metaseq.model_parallel:
- model_parallel.models.transformer_decoder -> TransformerDecoder (from metaseq.models.transformer_decoder)
- model_parallel.models.transformer_lm -> TransformerLanguageModel (from metaseq.models.transformer_lm)
- model_parallel.models.transformer_lm -> register_model, register_model_architecture (from metaseq.models)
@suchenzang, I would probably do the issue #583 as a separate PR, since the #626 is bringing quite some changes. But I can also include it there.
@bashnick yes, that makes perfect sense - the smaller you can break this up the better, but some bits might require sweeping changes to keep the diff sane, so leaving that up you!