gpt-neox
gpt-neox copied to clipboard
Adding MLM and Multitask Finetuning Adaptation
@EricHallahan FYSA
I've been working on this PR after talking to Lintang about it!
MLM should now be good to go, pretty much (I tested that loss goes down, but haven't had the GPUs to try training a 125m param model using it yet.) I'm still working on the MTF implementation.
I was wondering if there'd be interest/GPUs available for training some models using the MLM code a la the new OpenAI paper (https://arxiv.org/pdf/2207.14255.pdf) -- InCoder exists and was trained this way, but there's no English LM that has been trained this way + is publicly available.
@StellaAthena lmk if this would be feasible! I could take care of all the training, just don't have the GPUs available.
Merging this into a separate branch for the time being.
EDIT: I don't have permissions for some reason lol