gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Introduce improvements from OSLO

Open hyunwoongko opened this issue 2 years ago • 6 comments

  1. AOTAutograd is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to OSLO recently, and this makes training very faster. I want to add this to GPTNeoX, how about this? It would be nice to implement this on the DeeperSpeed side as well.

  2. OSLO changed megatron's MPU to have an odd number of embedding sizes. Therefore, there is no need to add meaningless padding tokens and this could increase memory efficiency, and by using this, I was able to implement the TP Automerging function as well. Note that this can merge 70+ architectures of transformers without checkpoint conversion scripts.

  3. Recently FusedRMSNorm is added to Apex and this has been merged into OSLO. The NeoX 20B doesn't seem to use RMSNorm, but this might be helpful.


I will continue to write the parts that I can improve.

hyunwoongko avatar Feb 23 '22 22:02 hyunwoongko

@sdtblck I saw you posted an issue regarding OSLO PP. Is there anything in PP you would like to improve?

hyunwoongko avatar Feb 23 '22 23:02 hyunwoongko

#2 sounds quite clever and I strongly support it.

Given that we are very far from the mainline DeepSpeed repo, would #1 involve a lot of unnecessary labor compared to doing it after we get back to the main version of DeepSpeed?

#3 seems like a low priority nice to have. I don’t have any plans to use that normalization, though I’m sure some people might. That said, 90% of the use that this library gets is currently internal to EleutherAI AFAIK, so things other people might want to use seems like a low priority.

StellaAthena avatar Feb 24 '22 23:02 StellaAthena

@StellaAthena

#2 I'm going to create a new branch in the current neox repo and experiment.

#1 This feature does not exist in DeepSpeed. so there is no need to worry about DeepSpeed upstream. Since I've already built it into a usable form in OSLO, it should be easy to add.

#3 I totally agree with you.

In addition, If there are any further parts that you would like to improve or experiment even if it has nothing to do with OSLO, please feel free to assign some tasks to me. I will totally help the neox project.

hyunwoongko avatar Feb 25 '22 06:02 hyunwoongko

@hyunwoongko Ah I think I misread your comments about #1 :) In that case I would certainly be interested in experimenting with it :)

Honestly, far and away the most helpful thing you could do is figure out how to bring us back in-line with the main DeepSpeed branch. I know that’s a big ask though, so no worries if it’s a bit daunting.

In terms of building out the library, the other most important things on the horizon are #479 and #215. There’s also some outstanding abandoned PRs with optimizers like Shampoo that would be nice to have cleaned up and finished. In terms of general library maintenance, #469 and various documentation improvements such as #506 #484 and #458 would all be quite helpful.

We could also always use help designing and orchestrating experiments. We can happily provide the compute for anyone willing to do the work… DM me on Slack if you’re interested.

StellaAthena avatar Feb 25 '22 06:02 StellaAthena

@hyunwoongko -- Would you like to restart this effort?

Quentin-Anthony avatar May 18 '23 21:05 Quentin-Anthony

@Quentin-Anthony sounds great.

hyunwoongko avatar May 20 '23 03:05 hyunwoongko