rome
rome copied to clipboard
Generating weights for efk/mend for new model
Hi, I was wondering if you guys can tell me how I can generate weights for distilgpt2 for the mend/efk baselines, similar to what you have for gpt2-xl here: https://rome.baulab.info/data/weights/. I'm trying to run these baselines but don't have the saved weights. I tried simply loading and saving huggingface's weights for distilgpt2 but it looks like the code is looking for something a bit different. If you guys have a script/suggestions, that would be great.
Thanks!
Hi @salemohamedo, you'll want to refer to the MEND repository for more information.
They have instructions for training a MEND baseline for a variety of GPT models; I don't recall whether DistilGPT is supported out-of-the-box, but I'm sure Eric can help you with this if not! After training a model, simply place the trained .pt model in baselines/mend/weights. You can consult mend_main.py for details on naming conventions, so that the code registers the new model file.
Let me know if you have further questions!