OpenChatKit
OpenChatKit copied to clipboard
Add example notebook and argument for 8-bit-inference
This PR:
- Adds the argument
--load-in-8bit
for inference - Adds an example jupyter/Colab notebook that can run
bot.py
inference (quantized) on a free Colab account (would have crashed after 5 prompts without quantizing. The user has the option to remove the 8-bit argument if running on a non-free account). - Update
transformers==4.21.1
totransformers==4.27.4
because:- It adds support for 8-bit quantization to the model class
- It shows a progress bar when loading the model which can be helpful for consumer hardware
- Update documentation to reflect recent changes (new model and example notebook)
- Fix typos
Note: the links to 'Open in Colab' have been modified to how it should look after the merge. For testing purposes, use this branch with the original links instead.
Solves #42
Turned to draft because it can be improved a bit as suggested by @exander77
This is ready for review.
The recent commit simplified the ChatModel class with the load-in-8bit arg and fixed an issue which was preventing it from being passed alongside CPU-offload.