neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

Numba package requried for int-4 quantization

Open aneelaka-int opened this issue 1 year ago • 2 comments

I am trying to run quantization for int4 examples from examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only but there is a package missing in the requirement.txt. numba package needs to be added to the requirements.txt.

/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
  return isinstance(object, types.FunctionType)
2024-10-14 22:22:46 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
Loading checkpoint shards: 100%|████████████████████████████████████████| 4/4 [00:05<00:00,  1.32s/it]
generation_config.json: 100%|████████████████████████████████████████| 243/243 [00:00<00:00, 3.53MB/s]
Map:  20%|█████████▏                                    | 2000/10000 [00:01<00:06, 1214.43 examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (970412 > 131072). Running this sequence through the model will result in indexing errors
Map: 100%|██████████████████████████████████████████████| 10000/10000 [00:11<00:00, 858.87 examples/s]
2024-10-14 22:23:06 [INFO][utils.py:93] Successfully collect 128 calibration samples.
Traceback (most recent call last):
  File "/workspace/reproducers/neural-compressor/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py", line 345, in <module>
    from neural_compressor.torch.algorithms.weight_only.utility import move_input_to_device
  File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/weight_only/__init__.py", line 16, in <module>
    from .save_load import save, load
  File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/weight_only/save_load.py", line 36, in <module>
    from .modules import HPUWeightOnlyLinear, INCWeightOnlyLinear, MulLinear
  File "/usr/local/lib/python3.10/dist-packages/neural_compressor/torch/algorithms/weight_only/modules.py", line 23, in <module>
    import numba
ModuleNotFoundError: No module named 'numba'

I was trying it for Qwen2 model But this seems to be the case for all the examples listed in the Readme as well.

aneelaka-int avatar Oct 14 '24 22:10 aneelaka-int