OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

Add example notebook and argument for 8-bit-inference

Open orangetin opened this issue 1 year ago • 2 comments

This PR:

  • Adds the argument --load-in-8bit for inference
  • Adds an example jupyter/Colab notebook that can run bot.py inference (quantized) on a free Colab account (would have crashed after 5 prompts without quantizing. The user has the option to remove the 8-bit argument if running on a non-free account).
  • Update transformers==4.21.1 to transformers==4.27.4 because:
    • It adds support for 8-bit quantization to the model class
    • It shows a progress bar when loading the model which can be helpful for consumer hardware
  • Update documentation to reflect recent changes (new model and example notebook)
  • Fix typos

Note: the links to 'Open in Colab' have been modified to how it should look after the merge. For testing purposes, use this branch with the original links instead. Open In Colab

Solves #42

orangetin avatar Mar 31 '23 20:03 orangetin

Turned to draft because it can be improved a bit as suggested by @exander77

orangetin avatar Apr 12 '23 15:04 orangetin

This is ready for review.

The recent commit simplified the ChatModel class with the load-in-8bit arg and fixed an issue which was preventing it from being passed alongside CPU-offload.

orangetin avatar Apr 18 '23 22:04 orangetin