FastChat support for 4bit quantization from transfomer library.

support for 4bit quantization from transfomer library.

Open harpomaxx opened this issue 2 years ago • 7 comments

Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?

Jun 27 '23 13:06 harpomaxx

Honestly, it's updating to transformers 4.30, adding one other dependency package, and about 8 changes in the code if I recall correctly. Plus it works with multi-gpus.

Unfortunately I lost my changes from my running copy when I updated for the API updates, but I think most of the work is already done in my fork.

Jun 29 '23 23:06 cidtrips

Contributions are welcome

Jul 01 '23 13:07 merrymercy

@merrymercy is this issue still open for contribution?

Aug 07 '24 14:08 02shanks

@02shanks absolutely!!!!

Aug 08 '24 00:08 surak

@surak as this is my first code contribution, could you please guide me through the process? Where should I start?

Aug 08 '24 08:08 02shanks

Well, the usual:

fork the repo,
branch it into a relevant name,
and contribute ONLY those changes related to the issue.
keep the repo up-to-date with the main branch, as this makes for an easier merge
comment it where applicable
once it's good enough, do a merge request. We will look into it and people will review it.

Nothing special, really!

Aug 10 '24 21:08 surak

@surak @merrymercy I have just created the PR. Can you please review it?

Aug 14 '24 16:08 02shanks

FastChat FastChat copied to clipboard

support for 4bit quantization from transfomer library.

FastChat
FastChat copied to clipboard