tvm
tvm copied to clipboard
[Bug] Inference - Phi-4 mini instruct
I had previously raised this issue on MLC LLM as well, but it seems that the root cause lies in PagedKVCache. With the recent release of Phi-4-mini-inst, the introduction of the partial_rotary_factor variable has led to a dimension mismatch issue. While manually adjusting rope_ext_factors allows inference to proceed, it results in garbage values. Therefore, I am reporting this issue here. Is there any way to resolve this issue?
Expected behavior
What you were expecting
Actual behavior
Traceback (most recent call last):
File "/opt/anaconda3/envs/mlc/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/opt/anaconda3/envs/mlc/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
File "/opt/anaconda3/envs/mlc/lib/python3.11/site-packages/tvm/_ffi/base.py", line 465, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: TVMError: Assert fail: T.Cast("int32", fused_rope_longrope_scaling_ext_factors_handle_shape[0]) == 64, Argument fused_rope_longrope_scaling.ext_factors_handle.shape[0] has an unsatisfied constraint: 64 == T.Cast("int32", fused_rope_longrope_scaling_ext_factors_handle_shape[0])
Environment
Any environment details, such as: Operating System, TVM version, etc
Steps to reproduce
Preferably a minimal script to cause the issue to occur.
Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).
- needs-triage