infinity
infinity copied to clipboard
AWQ-Bert / 4-bit Bert
trafficstars
Hoping to add a implementation of 4bit Bert, potentially in https://github.com/casper-hansen/AutoAWQ/pull/328. Contributions welcome
Hi @michaelfeil, any chance you will look more closely into quantizing BERT models with AWQ? Your PR was off to a great start, but needs more experimentation to figure out how to scale a BERT model.
@casper-hansen open for collaboration, but no further progress unfortunately.