gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Migrate tensor parallelism code to use OSLO

Open sdtblck opened this issue 3 years ago • 7 comments
trafficstars

Is your feature request related to a problem? Please describe. Would be good to remove the megatron tensor parallelism code from NeoX, and OSLO currently has support for this, and a slightly nicer interface.

Describe the solution you'd like

Steps:

  • [ ] Rewrite all current modules as plain pytorch implementations, removing the mpu dependency from any internal code as much as possible. (so, anything that's currently an mpu.[Column|Row]ParallelLinear or mpu.VocabParallelEmbedding should be replaced with its plain pytorch equivalent (nn.Linear / nn.Embedding respectively).
  • [ ] Write a mapping for neox modules, which oslo uses to handle parallelization.
  • [ ] Ensure backwards compatibility

sdtblck avatar Mar 01 '22 15:03 sdtblck

I will actively support this work.

hyunwoongko avatar Mar 01 '22 15:03 hyunwoongko

The main problem is that currently the model is loaded on the CPU and then moved to the GPU. OSLO was originally designed for transformers, and there was no way to pass downloaded checkpoints directly to the GPU in the transformers. (At least when I'm developing, so I didn't care about this) But we need to implement something like deepspeed.ZeroInit internally so that it's allocated to the GPU from scratch. I will try this right from tomorrow.

hyunwoongko avatar Mar 01 '22 15:03 hyunwoongko

@hyunwoongko actually in neox we also load onto the CPU and then move to the GPU, so i'm not sure this is a problem

sdtblck avatar Mar 01 '22 18:03 sdtblck

The main problem is that currently the model is loaded on the CPU and then moved to the GPU. OSLO was originally designed for transformers, and there was no way to pass downloaded checkpoints directly to the GPU in the transformers. (At least when I'm developing, so I didn't care about this) But we need to implement something like deepspeed.ZeroInit internally so that it's allocated to the GPU from scratch. I will try this right from tomorrow.

this is actually something we have a work-around for. I don't know if Transformers ever got around to merging it though.

StellaAthena avatar Mar 01 '22 18:03 StellaAthena

@sdtblck please check my branch. https://github.com/EleutherAI/gpt-neox/tree/kevin_new I am restructuring our code based on plain torch.

hyunwoongko avatar Mar 03 '22 08:03 hyunwoongko

@sdtblck Did you check my branch?

hyunwoongko avatar Mar 10 '22 06:03 hyunwoongko

@hyunwoongko -- Would you like to restart this effort?

Quentin-Anthony avatar May 18 '23 21:05 Quentin-Anthony