FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

support for 4bit quantization from transfomer library.

Open harpomaxx opened this issue 2 years ago • 7 comments

Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?

harpomaxx avatar Jun 27 '23 13:06 harpomaxx

Honestly, it's updating to transformers 4.30, adding one other dependency package, and about 8 changes in the code if I recall correctly. Plus it works with multi-gpus.

Unfortunately I lost my changes from my running copy when I updated for the API updates, but I think most of the work is already done in my fork.

cidtrips avatar Jun 29 '23 23:06 cidtrips

Contributions are welcome

merrymercy avatar Jul 01 '23 13:07 merrymercy

@merrymercy is this issue still open for contribution?

02shanks avatar Aug 07 '24 14:08 02shanks

@02shanks absolutely!!!!

surak avatar Aug 08 '24 00:08 surak

@surak as this is my first code contribution, could you please guide me through the process? Where should I start?

02shanks avatar Aug 08 '24 08:08 02shanks

Well, the usual:

  • fork the repo,
  • branch it into a relevant name,
  • and contribute ONLY those changes related to the issue.
  • keep the repo up-to-date with the main branch, as this makes for an easier merge
  • comment it where applicable
  • once it's good enough, do a merge request. We will look into it and people will review it.

Nothing special, really!

surak avatar Aug 10 '24 21:08 surak

@surak @merrymercy I have just created the PR. Can you please review it?

02shanks avatar Aug 14 '24 16:08 02shanks