gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Latest DeepSpeed Support

Open Quentin-Anthony opened this issue 3 years ago • 6 comments

@StellaAthena @ShivanshuPurohit

Note: we will not merge this unless we decide to get rid of DeeperSpeed

This branch completely does away with DeeperSpeed, and instead is based on upstream DeepSpeed. It doesn't take many gpt-neox changes to do this, but we lose some of the DeeperSpeed features. Feel free to use this branch unless your gpt-neox code explicitly relies on DeeperSpeed features.

Tested with:

  • [x] PP, MP > 1
  • [x] Zero-[1,2,3]
  • [ ] MoE [EleutherAI/gpt-neox/pull/677]
  • [ ] Autotuning
  • [ ] Curriculum learning

Quentin-Anthony avatar Sep 02 '22 17:09 Quentin-Anthony

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

StellaAthena avatar Sep 04 '22 17:09 StellaAthena

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed.

My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

Quentin-Anthony avatar Sep 09 '22 15:09 Quentin-Anthony

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed.

My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

The only thing I disagree with here is the detailed timers, which I and I think many others find quite useful. Would there be an easy way to make them part of GPT-NeoX as opposed to DeeperSpeed?

StellaAthena avatar Sep 18 '22 15:09 StellaAthena

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed. My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

The only thing I disagree with here is the detailed timers, which I and I think many others find quite useful. Would there be an easy way to make them part of GPT-NeoX as opposed to DeeperSpeed?

No there's no way to bring those out of DeeperSpeed. Should we update the DeeperSpeed main branch to just be the DeepSpeed main branch, but with timers (throwing everything else away)? We'd have to update it periodically, but merges would be pretty simple that way. I think bringing these timers into upstream DeepSpeed would be a hard sell.

Quentin-Anthony avatar Sep 19 '22 17:09 Quentin-Anthony

Who would do the selling though?

jamesthesnake avatar Sep 20 '22 23:09 jamesthesnake

Who would do the selling though?

Us to the DeepSpeed team. I'm saying it would be difficult to convince them that these timers are needed when they already have the FLOPs profiler and communication logger.

Quentin-Anthony avatar Sep 21 '22 12:09 Quentin-Anthony