auto-round [workaround provided]tuning microsoft/Phi-4-multimodal-instruct exception

Mar 11 '25 02:03 wenhuach21

I encountered the following issue while trying to quantize Phi-4-MM using auto-round. I would really appreciate your help in solving the quantization problem for Phi-4-multimodal.

File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 521, in quantize all_inputs = self.try_cache_inter_data_gpucpu(all_first_block_names, self.nsamples, layer_names=layer_names) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 854, in try_cache_inter_data_gpucpu all_inputs = self.cache_inter_data( File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 913, in cache_inter_data self.calib(nsamples, calib_bs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 809, in calib raise error File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 800, in calib self.model(**data_new) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/xueshu/.cache/huggingface/modules/transformers_modules/Phi-4-multimodal-instruct/modeling_phi4mm.py", line 2101, in forward input_mode = InputMode(input_mode) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 385, in call return cls.new(cls, value) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 710, in new raise ve_exc ValueError: None is not a valid InputMode

Apr 02 '25 09:04 xxinine

I encountered the following issue while trying to quantize Phi-4-MM using auto-round. I would really appreciate your help in solving the quantization problem for Phi-4-multimodal.

File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 521, in quantize all_inputs = self.try_cache_inter_data_gpucpu(all_first_block_names, self.nsamples, layer_names=layer_names) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 854, in try_cache_inter_data_gpucpu all_inputs = self.cache_inter_data( File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 913, in cache_inter_data self.calib(nsamples, calib_bs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 809, in calib raise error File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 800, in calib self.model(**data_new) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/xueshu/.cache/huggingface/modules/transformers_modules/Phi-4-multimodal-instruct/modeling_phi4mm.py", line 2101, in forward input_mode = InputMode(input_mode) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 385, in call return cls.new(cls, value) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 710, in new raise ve_exc ValueError: None is not a valid InputMode

Thank you for reporting this issue. Please note that it may take a bit longer to fix, as we currently have limited bandwidth available.

Apr 02 '25 10:04 wenhuach21

I encountered the following issue while trying to quantize Phi-4-MM using auto-round. I would really appreciate your help in solving the quantization problem for Phi-4-multimodal.

File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 521, in quantize all_inputs = self.try_cache_inter_data_gpucpu(all_first_block_names, self.nsamples, layer_names=layer_names) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 854, in try_cache_inter_data_gpucpu all_inputs = self.cache_inter_data( File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 913, in cache_inter_data self.calib(nsamples, calib_bs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 809, in calib raise error File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/auto_round/autoround.py", line 800, in calib self.model(**data_new) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/xueshu/.cache/huggingface/modules/transformers_modules/Phi-4-multimodal-instruct/modeling_phi4mm.py", line 2101, in forward input_mode = InputMode(input_mode) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 385, in call return cls.new(cls, value) File "/anaconda/envs/azureml_py38_PT_and_TF/lib/python3.10/enum.py", line 710, in new raise ve_exc ValueError: None is not a valid InputMode

after https://github.com/intel/auto-round/pull/509 is merged, you could try this workaround without algorithm tuning, this results to a 5.1G model

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor

model_name = "/models/Phi-4-multimodal-instruct"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", trust_remote_code=True)

tokenize = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
from auto_round import AutoRoundMLLM

group_size = 32
auto_round = AutoRoundMLLM(model, tokenizer=tokenize, processor=processor, iters=0, group_size=group_size)
auto_round.quantize_and_save("/data5/wenhuach/phi-int4", format="auto_gptq")
~~~python

Apr 11 '25 15:04 wenhuach21

I tried quantizing the model with the workaround provided above, with no luck. I tried every permitted format which did not help.

The error is (this one happened with format = 'auto_round :

2025-05-04 18:34:49,862 INFO auto_quantizer.py L319: We suggest you to set torch_dtype=torch.float16 for better efficiency on CUDA 2025-05-04 18:34:49,866 INFO modeling_phi4mm.py L82: create image tower None 2025-05-04 18:34:49,889 INFO modeling_phi4mm.py L115: freeze_img_processor = False 2025-05-04 18:34:49,889 INFO modeling_phi4mm.py L136: learnable separator enabled for hd transform, hd_transform_order = sub_glb 2025-05-04 18:34:49,890 INFO modeling_phi4mm.py L482: create audio processor {'config': {'activation': 'swish', 'activation_checkpointing': {'interval': 1, 'module': 'transformer', 'offload': False}, 'attention_dim': 1024, 'attention_heads': 16, 'batch_norm': False, 'bias_in_glu': True, 'causal': True, 'chunk_size': -1, 'cnn_layer_norm': True, 'conv_activation': 'swish', 'conv_glu_type': 'swish', 'depthwise_multiplier': 1, 'depthwise_seperable_out_channel': 1024, 'dropout_rate': 0.0, 'encoder_embedding_config': {'input_size': 80}, 'ext_pw_kernel_size': 1, 'ext_pw_out_channel': 1024, 'input_layer': 'nemo_conv', 'input_size': 80, 'kernel_size': 3, 'left_chunk': 18, 'linear_units': 1536, 'nemo_conv_settings': {'conv_channels': 1024}, 'num_blocks': 24, 'relative_attention_bias_args': {'t5_bias_max_distance': 500, 'type': 't5'}, 'time_reduction': 8}, 'name': 'cascades'} .cache\huggingface\modules\transformers_modules\quantized\speech_conformer_encoder.py:2774: FutureWarning: Please specify CheckpointImpl.NO_REENTRANT as CheckpointImpl.REENTRANT will soon be removed as the default and eventually deprecated. lambda i: encoder_checkpoint_wrapper( 2025-05-04 18:34:49,967 INFO modeling_phi4mm.py L505: freeze_audio_processor = False 2025-05-04 18:34:49,967 INFO modeling_phi4mm.py L512: gradient checkpointing enabled for audio processor 2025-05-04 18:34:50,069 INFO config.py L54: PyTorch version 2.6.0+cu124 available. 2025-05-04 18:34:51,165 WARNING convert_model.py L564: better backend is found, please install all the following requirements to enable it,pip install -v 'gptqmodel>=2.0' --no-build-isolation` Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.24s/it]

--- AUDIO PROCESSING --- Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. ['<|system|>You are an intelligent AI service.<|end|><|user|><|audio_1|>Provide a text transcription of the spoken content.<|end|><|assistant|>'] AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils\checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Traceback (most recent call last): File "AppData\Local\Programs\Python\Python312\Lib\runpy.py", line 198, in _run_module_as_main return _run_code(code, main_globals, None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "AppData\Local\Programs\Python\Python312\Lib\runpy.py", line 88, in run_code exec(code, run_globals) File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy\launcher/../..\debugpy_main.py", line 71, in cli.main() File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 501, in main run() File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 351, in run_file runpy.run_path(target, run_name="main") File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 310, in run_path return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 127, in _run_module_code _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name) File ".vscode\extensions\ms-python.debugpy-2025.6.0-win32-x64\bundled\libs\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 118, in _run_code exec(code, run_globals) File "python\phi4.py", line 97, in generate_ids = model.generate( ^^^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 2465, in generate result = self._sample( ^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\transformers\generation\utils.py", line 3431, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".cache\huggingface\modules\transformers_modules\quantized\modeling_phi4mm.py", line 2116, in forward outputs = self.model( ^^^^^^^^^^^ File "AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".cache\huggingface\modules\transformers_modules\quantized\modeling_phi4mm.py", line 1755, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".cache\huggingface\modules\transformers_modules\quantized\modeling_phi4mm.py", line 1459, in forward attn_outputs, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".cache\huggingface\modules\transformers_modules\quantized\modeling_phi4mm.py", line 1211, in forward qkv = self.qkv_proj(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\Local\Programs\Python\Python312\Lib\site-packages\peft\tuners\lora\layer.py", line 724, in forward x = self._cast_input_dtype(x, lora_A.weight.dtype) ^^^^^^^^^^^^^ File "Local\Programs\Python\Python312\Lib\site-packages\torch\nn\modules\module.py", line 1928, in getattr raise AttributeError( AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'?`

I also had to change the preprocessor_config.json after quantization because it throws up errors - it will complain that audio_ compression_rate and a few other arguments are missing. The altered file is attached.

preprocessor_config.json

Some help would be appreciated. TIA.

May 04 '25 10:05 palladium123

You probably need to modify modeling_phi4mm.py. The root cause is that for quantized models, we typically replace the original layers with quantized versions, and the weights are usually renamed to qweight. This is a common practice adopted by the quantization community and we leverage their cuda kernels, so we prefer to keep it as is for now. That said, we have encountered similar issues in several models, for example, DeepSeekV2 in the latest release, and Qwen2VL when quantizing non-text modules.

May 06 '25 01:05 wenhuach21