Mihir Patel
Mihir Patel
We'd love community PRs for this! Happy to help review and design. It's not currently on our roadmap, but we are evaluating it.
@tgale96 might have scripts for megatron LM integration We will have integrations with other stacks soon. For DBRX specifically, you do not necessarily need to use megablocks (though it is...
I've also seen some issues with AMP. I think theres something missing somewhere... but all the functions seem wrapped to me?
@jramapuram any chance you can provide a mini repro? happy to look into it
Seems reasonable to me. I would emit a `log.info` that it's being skipped for auto-creation if the requisite args are passed but an existing callback is present. We'd love a...
> Cool! I can start working on it tomorrow. > > Out of curiosity, when is the next release expected? Maybe I get this merged before it's out. We are...
Is this during eval? Can you provide a minimum repro?
Ah, this is because you dont store LBL during eval. You should set model to eval mode. We should give a friendlier error... CC: @eitanturok
before eval, you'll need to call `model.eval()`. @eitanturok can you look at tweaking scripts?
This might have to happen in `third_party/Megatron-LM/pretrain_gpt.py` which is the script being called...