Rework a bit the LLaMA conversion script

Open sgugger opened this issue 1 year ago • 1 comments

What does this PR do?

This PR makes sure the LLaMA conversion script stays up to date with save_pretrained by having the checkpoint being loaded in an actual model then saved via that method. This avoids a lot of hard-coded values in JSON files.

It keeps the old logic and merely re-loads the result in a Transformer model (after cleaning anything to make sure we never go above the model size in CPU RAM). It also changes a bit the API to put everything in the output folder like we usually have in repos on huggingface.

cc @zphang so you are aware of this.

Mar 17 '23 18:03 sgugger

The documentation is not available anymore as the PR was closed or merged.

Mar 17 '23 18:03 HuggingFaceDocBuilderDev

I don't see how you can reduce the memory requirement since the files provided by Meta each contain a part of all weights, so you need to have them all loaded to reconstruct just one of the weights. That's why I didn't bother implementing sharding on the fly.

Mar 20 '23 13:03 sgugger

Indeed, just realised you have to cat them 😞 my bad!

Mar 20 '23 13:03 ArthurZucker

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Mar 20 '23 15:03 HuggingFaceDocBuilderDev

transformers transformers copied to clipboard

Rework a bit the LLaMA conversion script

What does this PR do?

transformers
transformers copied to clipboard