metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

Unify model-parallel vs non model-parallel codepaths

Open suchenzang opened this issue 2 years ago • 1 comments

As https://github.com/facebookresearch/metaseq/issues/383 flagged, right now we have two separate codepaths for model-parallel vs non model-parallel code (aka https://github.com/facebookresearch/metaseq/blob/main/metaseq/model_parallel/models/transformer_lm.py vs https://github.com/facebookresearch/metaseq/blob/main/metaseq/models/transformer_lm.py as one example).

This should all be unified.

suchenzang avatar Oct 06 '22 22:10 suchenzang

I looked into this and it was trickier than I expected. We have to make a fair amount of changes to megatron.

stephenroller avatar Oct 14 '22 12:10 stephenroller

hey @suchenzang, I looked at the task. The model_parallel code path uses some of the models&modules from the metaseq.models path. So there are two options:

  1. move everything from model_parallel path to metaseq.models path and remove model_parallel path
  2. move everything to metaseq.model_parallel path and remove models&modules path.

I think, the option 1 is probably better - flattens the structure. In any case there would be quite some code refactoring. + we can probably remove all the _encoder files as well.

Context: Currently we use the following models from metaseq.models in metaseq.model_parallel:

  • model_parallel.models.transformer_decoder -> TransformerDecoder (from metaseq.models.transformer_decoder)
  • model_parallel.models.transformer_lm -> TransformerLanguageModel (from metaseq.models.transformer_lm)
  • model_parallel.models.transformer_lm -> register_model, register_model_architecture (from metaseq.models)

bashnick avatar Feb 01 '23 12:02 bashnick

@suchenzang, I would probably do the issue #583 as a separate PR, since the #626 is bringing quite some changes. But I can also include it there.

bashnick avatar Feb 01 '23 18:02 bashnick

@bashnick yes, that makes perfect sense - the smaller you can break this up the better, but some bits might require sweeping changes to keep the diff sane, so leaving that up you!

suchenzang avatar Feb 01 '23 19:02 suchenzang