FastChat
FastChat copied to clipboard
support for 4bit quantization from transfomer library.
Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?
Honestly, it's updating to transformers 4.30, adding one other dependency package, and about 8 changes in the code if I recall correctly. Plus it works with multi-gpus.
Unfortunately I lost my changes from my running copy when I updated for the API updates, but I think most of the work is already done in my fork.
Contributions are welcome
@merrymercy is this issue still open for contribution?
@02shanks absolutely!!!!
@surak as this is my first code contribution, could you please guide me through the process? Where should I start?
Well, the usual:
- fork the repo,
- branch it into a relevant name,
- and contribute ONLY those changes related to the issue.
- keep the repo up-to-date with the main branch, as this makes for an easier merge
- comment it where applicable
- once it's good enough, do a merge request. We will look into it and people will review it.
Nothing special, really!
@surak @merrymercy I have just created the PR. Can you please review it?