OpenChatKit Add example notebook and argument for 8-bit-inference

Add example notebook and argument for 8-bit-inference

Open orangetin opened this issue 1 year ago • 2 comments

This PR:

Adds the argument --load-in-8bit for inference
Adds an example jupyter/Colab notebook that can run bot.py inference (quantized) on a free Colab account (would have crashed after 5 prompts without quantizing. The user has the option to remove the 8-bit argument if running on a non-free account).
Update transformers==4.21.1 to transformers==4.27.4 because:
- It adds support for 8-bit quantization to the model class
- It shows a progress bar when loading the model which can be helpful for consumer hardware
Update documentation to reflect recent changes (new model and example notebook)
Fix typos

Note: the links to 'Open in Colab' have been modified to how it should look after the merge. For testing purposes, use this branch with the original links instead.

Solves #42

Mar 31 '23 20:03 orangetin

Turned to draft because it can be improved a bit as suggested by @exander77

Apr 12 '23 15:04 orangetin

This is ready for review.

The recent commit simplified the ChatModel class with the load-in-8bit arg and fixed an issue which was preventing it from being passed alongside CPU-offload.

Apr 18 '23 22:04 orangetin

OpenChatKit OpenChatKit copied to clipboard

Add example notebook and argument for 8-bit-inference

OpenChatKit
OpenChatKit copied to clipboard