transformers
transformers copied to clipboard
Rework a bit the LLaMA conversion script
What does this PR do?
This PR makes sure the LLaMA conversion script stays up to date with save_pretrained
by having the checkpoint being loaded in an actual model then saved via that method. This avoids a lot of hard-coded values in JSON files.
It keeps the old logic and merely re-loads the result in a Transformer model (after cleaning anything to make sure we never go above the model size in CPU RAM). It also changes a bit the API to put everything in the output folder like we usually have in repos on huggingface.
cc @zphang so you are aware of this.
The documentation is not available anymore as the PR was closed or merged.
I don't see how you can reduce the memory requirement since the files provided by Meta each contain a part of all weights, so you need to have them all loaded to reconstruct just one of the weights. That's why I didn't bother implementing sharding on the fly.
Indeed, just realised you have to cat
them 😞 my bad!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.