mlc-llm
mlc-llm copied to clipboard
[Bug] Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC can't run
🐛 Bug
FileNotFoundError: Cannot find the model library that corresponds to None
when running mixtral
To Reproduce
I followed this example for usage: https://github.com/mlc-ai/mlc-llm/pull/1529#issue-2063018397
from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()
MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1
def main():
cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
sliding_window_size=1024,
tensor_parallel_shards=NUM_GPU,
))
cm.generate("Who is Garry Kasparow?", progress_callback=callback.StreamToStdout(callback_interval=2))
if __name__ == "__main__":
main()
Expected behavior
Mixtral can be loaded and inferenced
Environment
- Platform: CUDA
- Operating system: Ubuntu
- Device: H100
- Python version 3.12
Error trace:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 47, in <module>
main()
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main
cli.main(sys.argv[2:])
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main
compile(
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile
_compile(args, model_config)
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile
args.build_func(
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 235, in build
relax.build(
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build
mod = pipeline(mod)
^^^^^^^^^^^^^
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
return _ffi_transform_api.RunPass(self, mod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 157, in _pipeline
mod = seq(mod)
^^^^^^^^
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
return _ffi_transform_api.RunPass(self, mod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
11: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
10: tvm::transform::Pass::operator()(tvm::IRModule) const
9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
6: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1}>(tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
4: tvm::relax::StaticPlanBlockMemory(tvm::IRModule)
3: tvm::relax::StorageAllocatorInit::Initialize(tvm::IRModule const&, tvm::arith::Analyzer*)
2: tvm::relax::StorageAllocatorInit::VisitExpr_(tvm::relax::FunctionNode const*)
1: tvm::relax::SetTIRVarUpperBound(tvm::relax::Function, tvm::arith::Analyzer*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/relax/transform/static_plan_block_memory.cc", line 360
TVMError: Check failed: value->value > 0 (-1 vs. 0) : The entry value of attr `tir_var_upper_bound` should be a positive integer, while -1 is got.
Traceback (most recent call last):
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 756, in __init__
self.model_lib_path = _get_lib_module_path(
^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 578, in _get_lib_module_path
raise FileNotFoundError(err_msg)
FileNotFoundError: Cannot find the model library that corresponds to `None`.
`None` is either provided in the `chat_config` you passed in, or specified in .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/mlc-chat-config.json.
We searched over the following possible paths:
- None-cuda.so
- dist/prebuilt/lib/None-cuda.so
- dist/HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/None-cuda.so
If you would like to directly specify the model library path, you may consider passing in the `ChatModule.model_lib_path` parameter.
Please checkout https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb for an example on how to load a model.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_mlc.py", line 16, in <module>
main()
File "test_mlc.py", line 9, in main
cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 771, in __init__
jit.jit(
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 122, in jit
_run_jit(
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 95, in _run_jit
subprocess.run(cmd, check=True)
File "miniconda3/envs/mlc-chat-venv/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['miniconda3/envs/mlc-chat-venv/bin/python', '-m', 'mlc_chat', 'compile', '.cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC', '--opt', 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0', '--overrides', 'sliding_window_size=1024;prefill_chunk_size=4096;attention_sink_size=4;max_batch_size=80;tensor_parallel_shards=1', '--device', 'cuda:1', '--output', '/tmp/tmpu_l85k1j/lib.so']' returned non-zero exit status 1.
I know you could compile a 'None-cude.so' file to the path it need, but it's not the final solution
@DanielProkhorov I'm getting the same error on MacOS when attempting to compile the model. Specifically it fails with: The entry value of attr
tir_var_upper_bound should be a positive integer, while -1 is got.
. Were you able to resolve this? Thanks
@DanielProkhorov I'm getting the same error on MacOS when attempting to compile the model. Specifically it fails with:
The entry value of attr
tir_var_upper_boundshould be a positive integer, while -1 is got.
. Were you able to resolve this? Thanks
I fixed this by compiling with context_window_size
4096 instead of -1, but now I'm getting gibberish outputs
Were you able to resolve this?
@brian-pieces Unfortunately not...
we have updated the process lately to focus on jit compilation so closing for now