mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Bug] Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC can't run

Open DanielProkhorov opened this issue 11 months ago • 4 comments

🐛 Bug

FileNotFoundError: Cannot find the model library that corresponds to None when running mixtral

To Reproduce

I followed this example for usage: https://github.com/mlc-ai/mlc-llm/pull/1529#issue-2063018397

from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()

MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1

def main():
    cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
        sliding_window_size=1024,
        tensor_parallel_shards=NUM_GPU,
    ))
    cm.generate("Who is Garry Kasparow?", progress_callback=callback.StreamToStdout(callback_interval=2))

if __name__ == "__main__":
    main()

Expected behavior

Mixtral can be loaded and inferenced

Environment

  • Platform: CUDA
  • Operating system: Ubuntu
  • Device: H100
  • Python version 3.12

Error trace:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 47, in <module>
    main()
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main
    cli.main(sys.argv[2:])
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main
    compile(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile
    _compile(args, model_config)
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile
    args.build_func(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 235, in build
    relax.build(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build
    mod = pipeline(mod)
          ^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 157, in _pipeline
    mod = seq(mod)
          ^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  11: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  6: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1}>(tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::relax::StaticPlanBlockMemory(tvm::IRModule)
  3: tvm::relax::StorageAllocatorInit::Initialize(tvm::IRModule const&, tvm::arith::Analyzer*)
  2: tvm::relax::StorageAllocatorInit::VisitExpr_(tvm::relax::FunctionNode const*)
  1: tvm::relax::SetTIRVarUpperBound(tvm::relax::Function, tvm::arith::Analyzer*)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/relax/transform/static_plan_block_memory.cc", line 360
TVMError: Check failed: value->value > 0 (-1 vs. 0) : The entry value of attr `tir_var_upper_bound` should be a positive integer, while -1 is got.
Traceback (most recent call last):
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 756, in __init__
    self.model_lib_path = _get_lib_module_path(
                          ^^^^^^^^^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 578, in _get_lib_module_path
    raise FileNotFoundError(err_msg)
FileNotFoundError: Cannot find the model library that corresponds to `None`.
`None` is either provided in the `chat_config` you passed in, or specified in .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/mlc-chat-config.json.
We searched over the following possible paths: 
- None-cuda.so
- dist/prebuilt/lib/None-cuda.so
- dist/HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/None-cuda.so
If you would like to directly specify the model library path, you may consider passing in the `ChatModule.model_lib_path` parameter.
Please checkout https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb for an example on how to load a model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_mlc.py", line 16, in <module>
    main()
  File "test_mlc.py", line 9, in main
    cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 771, in __init__
    jit.jit(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 122, in jit
    _run_jit(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 95, in _run_jit
    subprocess.run(cmd, check=True)
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['miniconda3/envs/mlc-chat-venv/bin/python', '-m', 'mlc_chat', 'compile', '.cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC', '--opt', 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0', '--overrides', 'sliding_window_size=1024;prefill_chunk_size=4096;attention_sink_size=4;max_batch_size=80;tensor_parallel_shards=1', '--device', 'cuda:1', '--output', '/tmp/tmpu_l85k1j/lib.so']' returned non-zero exit status 1.

DanielProkhorov avatar Feb 27 '24 08:02 DanielProkhorov

I know you could compile a 'None-cude.so' file to the path it need, but it's not the final solution

gxmlfx avatar Mar 13 '24 05:03 gxmlfx

@DanielProkhorov I'm getting the same error on MacOS when attempting to compile the model. Specifically it fails with: The entry value of attr tir_var_upper_bound should be a positive integer, while -1 is got.. Were you able to resolve this? Thanks

brian-pieces avatar Mar 15 '24 14:03 brian-pieces

@DanielProkhorov I'm getting the same error on MacOS when attempting to compile the model. Specifically it fails with: The entry value of attr tir_var_upper_bound should be a positive integer, while -1 is got.. Were you able to resolve this? Thanks

I fixed this by compiling with context_window_size 4096 instead of -1, but now I'm getting gibberish outputs

brian-pieces avatar Mar 15 '24 16:03 brian-pieces

Were you able to resolve this?

@brian-pieces Unfortunately not...

DanielProkhorov avatar Mar 18 '24 21:03 DanielProkhorov

we have updated the process lately to focus on jit compilation so closing for now

tqchen avatar May 28 '24 02:05 tqchen