GPTQ GPT-J support
Hey there.
Please take a look at this code: https://github.com/AlpinDale/gptq-gptj
Could you add 4bit quantization support for GPT-J? If everything is done, this would allow Pygmalion 6B to load in 4bits.
Much appreciated.
Important: This code is being tested right now, so it may or may not work. Feel free to provide feedback to AlpinDale.
The code still needs testing before an attempt at implemention is made. I have not tested it yet - I'm not 100% sure I've got the layer names correctly. Theoretically it should be fine. Please submit issues (or PRs!) if you find it dysfunctional.
Can confirm that the GPTQ implementation for the GPT-J 6B model (and any model fine-tuned off of it, such as Pygmalion 6B) seem to be working perfectly.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.