torchchat [TORCHAO] Handle non-multiple group sizes, support padding as appropriate in torchao and kernels

[TORCHAO] Handle non-multiple group sizes, support padding as appropriate in torchao and kernels

Open mikekgfb opened this issue 10 months ago • 2 comments

@jerryzh168 Please add consistent padding support in torchao to make models quantizable @digantdesai what's the best way to implement this - just round up and ignore part of the result?
I can't imagine it's worthwhile to write a kernel for partial groups. Presumably this needs to be done before allocation? How

https://github.com/pytorch/torchchat/actions/runs/8857009260/job/24323964701?pr=519

******************************************
*** --quantize config/data/mobile.json ***
******************************************
INFO:datasets:PyTorch version 2.4.0.dev20240422 available.
Using device=cpu
Loading model...
Time to load model: 0.01 seconds
Quantizing the model with: {'embedding': {'bitwidth': 4, 'groupsize': 32}, 'linear:a8w4dq': {'groupsize': 256}}

Downloading builder script:   0%|          | 0.00/5.67k [00:00<?, ?B/s]
Downloading builder script: 100%|██████████| 5.67k/5.67k [00:00<00:00, 4.87MB/s]
Traceback (most recent call last):
linear: layers.0.attention.wq, in=288, out=288
  File "/Users/runner/work/torchchat/torchchat/export.py", line 111, in <module>
    main(args)
  File "/Users/runner/work/torchchat/torchchat/export.py", line 61, in main
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/torchchat/torchchat/build/builder.py", line 406, in _initialize_model
    quantize_model(model, builder_args.device, quantize, tokenizer)
  File "/Users/runner/work/torchchat/torchchat/quantize.py", line 52, in quantize_model
    ).quantized_model()
      ^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/torchchat/torchchat/quantize.py", line 99, in quantized_model
    return self.quantizer.quantize(self.model_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torchao/quantization/GPTQ.py", line 1256, in quantize
    state_dict = self._create_quantized_state_dict(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torchao/quantization/GPTQ.py", line 1193, in _create_quantized_state_dict
    in_features % self.groupsize == 0
AssertionError: require in_features:288 % self.groupsize:256 == 0
Error: Process completed with exit code 1.

Apr 27 '24 08:04 mikekgfb

torchchat torchchat copied to clipboard

[TORCHAO] Handle non-multiple group sizes, support padding as appropriate in torchao and kernels

torchchat
torchchat copied to clipboard