torchchat
torchchat copied to clipboard
[TORCHAO] Handle non-multiple group sizes, support padding as appropriate in torchao and kernels
@jerryzh168 Please add consistent padding support in torchao to make models quantizable
@digantdesai what's the best way to implement this - just round up and ignore part of the result?
I can't imagine it's worthwhile to write a kernel for partial groups. Presumably this needs to be done before
allocation? How
https://github.com/pytorch/torchchat/actions/runs/8857009260/job/24323964701?pr=519
******************************************
*** --quantize config/data/mobile.json ***
******************************************
INFO:datasets:PyTorch version 2.4.0.dev20240422 available.
Using device=cpu
Loading model...
Time to load model: 0.01 seconds
Quantizing the model with: {'embedding': {'bitwidth': 4, 'groupsize': 32}, 'linear:a8w4dq': {'groupsize': 256}}
Downloading builder script: 0%| | 0.00/5.67k [00:00<?, ?B/s]
Downloading builder script: 100%|██████████| 5.67k/5.67k [00:00<00:00, 4.87MB/s]
Traceback (most recent call last):
linear: layers.0.attention.wq, in=288, out=288
File "/Users/runner/work/torchchat/torchchat/export.py", line 111, in <module>
main(args)
File "/Users/runner/work/torchchat/torchchat/export.py", line 61, in main
model = _initialize_model(
^^^^^^^^^^^^^^^^^^
File "/Users/runner/work/torchchat/torchchat/build/builder.py", line 406, in _initialize_model
quantize_model(model, builder_args.device, quantize, tokenizer)
File "/Users/runner/work/torchchat/torchchat/quantize.py", line 52, in quantize_model
).quantized_model()
^^^^^^^^^^^^^^^^^
File "/Users/runner/work/torchchat/torchchat/quantize.py", line 99, in quantized_model
return self.quantizer.quantize(self.model_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torchao/quantization/GPTQ.py", line 1256, in quantize
state_dict = self._create_quantized_state_dict(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torchao/quantization/GPTQ.py", line 1193, in _create_quantized_state_dict
in_features % self.groupsize == 0
AssertionError: require in_features:288 % self.groupsize:256 == 0
Error: Process completed with exit code 1.