transformers
transformers copied to clipboard
Contradictory information in documentation about the ability to push qunatized models to hub
System Info
Using Google Colab and the main branch of the transformers library on GitHub.
Who can help?
@sgugger @stevhliu @MKhalusova
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
The note at the end of the section Load a large model in 4bit and Load a large model in 8bit suggests that it's not possibel to push the quantized weights on the hub:
Note that once a model has been loaded in 4-bit it is currently not possible to push the quantized weights on the Hub.
Note that once a model has been loaded in 8-bit it is currently not possible to push the quantized weights on the Hub except if you use the latest transformers and bitsandbytes.
But the example in Push quantized models on the 🤗 Hub suggests that it's possible to push quantized models to the hub. Same is suggested in Load a quantized model from the 🤗 Hub
Does it mean that push to hub is only supported for 8-bit quantized models when using the latest transformers and bitsandbytes but NOT for 4-bit models?
Or is it actually possible to push to hub for both 8-bit and 4-bit quantized models?
Expected behavior
Can 4-bit and 8-bit quantized models be pushed to hub and be loaded from hub?
cc @younesbelkada
Hi @amdnsr Thanks for the issue as explained in the mentioned paragraphs, it is possible to push 8bit quantized weights only if you use the latest transformers + bitsandbytes. However, pushing 4bit weights is currently not supported
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.