torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

linear:int4 issues - RuntimeError: Missing out variants: {'aten::_weight_int4pack_mm'}

Open mikekgfb opened this issue 9 months ago • 0 comments

(py311) mikekg@mikekg-mbp torchchat %  python export.py --checkpoint-path ${MODEL_PATH} --temperature 0  --quantize '{"linear:int4": {"groupsize": 128}}' --output-pte mode.pte
[...]
Traceback (most recent call last):
  File "/Users/mikekg/qops/torchchat/export.py", line 111, in <module>
    main(args)
  File "/Users/mikekg/qops/torchchat/export.py", line 91, in main
    export_model_et(
  File "/Users/mikekg/qops/torchchat/export_et.py", line 98, in export_model
    export_program = edge_manager.to_executorch(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikekg/miniconda3/envs/py311/lib/python3.11/site-packages/executorch/exir/program/_program.py", line 899, in to_executorch
    new_gm_res = p(new_gm)
                 ^^^^^^^^^
  File "/Users/mikekg/miniconda3/envs/py311/lib/python3.11/site-packages/torch/fx/passes/infra/pass_base.py", line 40, in __call__
    res = self.call(graph_module)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikekg/miniconda3/envs/py311/lib/python3.11/site-packages/executorch/exir/passes/__init__.py", line 423, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::_weight_int4pack_mm'}

Current fail is expected -- somewhat anyway after adding the packed call to the _weight_int4pack_mm but documented incorrectly in docs/quantization.md. I think @lucylq most recently updated the specs to streamline them but that glossed over the reality that we have a bit of a swiss cheese situation. That's sad and not pretty to show, but sadly our current reality

I'll try to patch up most execution modes, but we really do need tests. And for performance, maybe the plan should be to hook up _weight_int4pack_mm to an asymmetric version of a8w4dq (as per https://github.com/pytorch/torchchat/issues/541). Of course that's also not quite "correct", but how many modes and operators can we put with how much documentation? FP operators already have a bit of a spread in terms of accruacy based on rounding effects, so maybe that's justifiable...

mikekgfb avatar Apr 29 '24 03:04 mikekgfb