how to build mlc-llm-cli on Linux
I want to run vicuna-7b on nv gpu based on mlc-llm. I followed the intruction and have some changes:
-
Install relax.
git clone https://github.com/mlc-ai/relax.git --recursive cd relax mkdir build cp cmake/config.cmake buildin build/config.cmake, set
USE_CUDA,USE_CUDNN,USE_CUBLASandUSE_LLVMas ONcmake .. make -j export TVM_HOME=/path/to/relax export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python -
Get Model Weight I just use the following:
git lfs install git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b mkdir -p dist/models ln -s path/to/vicuna-v1-7b dist/models/vicuna-v1-7b
But there is no config.json and some other necessary files in vicuna-v1-7b path! They are transformed by transform_params. We need vicuna-v1-7b Pytorch format.
$ tree dist/
dist/
└── models
└── vicuna-v1-7b
├── float16
│ ├── ndarray-cache.json
│ ├── tokenizer.model
│ ├── params_shard_0.bin
│ ├── params_shard_100.bin
│ ├── params_shard_101.bin
│ ├── params_shard_102.bin
│ ├── params_shard_103.bin
│ ├── params_shard_104.bin
│ ├── params_shard_105.bin
...
I found that there is a vicuna-v1-7b with huggingface format whose dtype is float16 and vocab_size is 32001. So I downloaded this one.
- build model to library
The output:git clone https://github.com/mlc-ai/mlc-llm.git --recursive cd mlc-llm # change vocab_size=32001 in llma.py python3 build.py --model vicuna-v1-7b --dtype float16 --target cuda --max-seq-len 768 --artifact-path ../dist/
...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:10<00:00, 1.39it/s]
Total param size: 3.9229860305786133 GB
Start storing to cache ../dist/vicuna-v1-7b/float16/params
[0745/0745] saving param_744
All finished, 132 total shards committed, record saved to ../dist/vicuna-v1-7b/float16/params/ndarray-cache.json
Save a cached module to ../dist/vicuna-v1-7b/float16/mod_cache_before_build_float16.pkl.
20 static functions: [I.GlobalVar("rotary_embedding1"), I.GlobalVar("fused_decode1_fused_matmul5_multiply"), I.GlobalVar("decode4"), I.GlobalVar("slice1"), I.GlobalVar("squeeze"), I.GlobalVar("fused_decode_matmul3"), I.GlobalVar("fused_decode_fused_matmul3_add"), I.GlobalVar("fused_decode2_fused_matmul6_add"), I.GlobalVar("decode6"), I.GlobalVar("fused_transpose4_reshape4"), I.GlobalVar("transpose2"), I.GlobalVar("rms_norm1"), I.GlobalVar("fused_decode1_fused_matmul5_silu"), I.GlobalVar("reshape1"), I.GlobalVar("decode5"), I.GlobalVar("reshape"), I.GlobalVar("fused_reshape2_squeeze"), I.GlobalVar("reshape2"), I.GlobalVar("take_decode"), I.GlobalVar("fused_decode3_fused_matmul7_cast2")]
26 dynamic functions: [I.GlobalVar("fused_NT_matmul1_add1"), I.GlobalVar("extend_te"), I.GlobalVar("full"), I.GlobalVar("reshape3"), I.GlobalVar("reshape5"), I.GlobalVar("rotary_embedding"), I.GlobalVar("take_decode1"), I.GlobalVar("fused_NT_matmul_divide_maximum_minimum_cast"), I.GlobalVar("NT_matmul1"), I.GlobalVar("fused_softmax1_cast4"), I.GlobalVar("fused_NT_matmul3_silu1"), I.GlobalVar("fused_NT_matmul2_divide1_maximum1_minimum1_cast3"), I.GlobalVar("matmul8"), I.GlobalVar("transpose5"), I.GlobalVar("fused_softmax_cast1"), I.GlobalVar("fused_min_max_triu_te_broadcast_to"), I.GlobalVar("reshape7"), I.GlobalVar("reshape8"), I.GlobalVar("slice"), I.GlobalVar("fused_NT_matmul3_multiply1"), I.GlobalVar("squeeze1"), I.GlobalVar("matmul4"), I.GlobalVar("transpose3"), I.GlobalVar("rms_norm"), I.GlobalVar("fused_NT_matmul4_add1"), I.GlobalVar("reshape6")]
Dump static shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_static.py
Dump dynamic shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_dynamic.py
- Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_multiply
- Dispatch to pre-scheduled op: decode4
- Dispatch to pre-scheduled op: fused_NT_matmul1_add1
- Dispatch to pre-scheduled op: decode5
- Dispatch to pre-scheduled op: NT_matmul1
- Dispatch to pre-scheduled op: fused_NT_matmul_divide_maximum_minimum_cast
- Dispatch to pre-scheduled op: fused_softmax1_cast4
- Dispatch to pre-scheduled op: fused_NT_matmul3_silu1
- Dispatch to pre-scheduled op: fused_NT_matmul2_divide1_maximum1_minimum1_cast3
- Dispatch to pre-scheduled op: fused_decode_matmul3
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: fused_softmax_cast1
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: fused_decode_fused_matmul3_add
- Dispatch to pre-scheduled op: fused_decode2_fused_matmul6_add
- Dispatch to pre-scheduled op: decode6
- Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_silu
- Dispatch to pre-scheduled op: fused_NT_matmul3_multiply1
- Dispatch to pre-scheduled op: matmul4
- Dispatch to pre-scheduled op: rms_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_add1
Finish exporting to ../dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
-
Prepare lib and params There is instruction for ios as following. After Step 3,
paramsandvicuna-v1-7b_cuda_float16.soare under path../dist/vicuna-v1-7b/float16/paramscd ios ./prepare_libs.sh ./prepare_params.sh -
Build mlc-llm-cli Is there any introduction on how to build
mlc-llm-cli? I use the following but got error. @MasterJH5574 Could you please give some advice? Thanks a lot^_^mkdir build && cd build cmake .. && make [ 0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o [ 0%] Built target mlc_llm_objs [ 0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o [ 0%] Built target mlc_cli_objs [ 0%] Generating release/libtokenizers_cpp.a No such file or directory make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:71: tokenizers/release/libtokenizers_cpp.a] Error 1 make[1]: *** [CMakeFiles/Makefile2:674: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2 make: *** [Makefile:156: all] Error 2
Run make -n to get debug info:
make -n
...
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --green --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=100 "Building CXX object tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o"
cd /z/Dev/mlc-llm/build/tvm && /usr/bin/c++ -DDMLC_USE_FOPEN64=0 -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DNDEBUG -DNDEBUG=1 -DTVM_INDEX_DEFAULT_I64=1 -DTVM_THREADPOOL_USE_OPENMP=0 -DTVM_USE_LIBBACKTRACE=0 -DUSE_FALLBACK_STL_MAP=0 -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/libcrc/include -isystem /z/Dev/relax/3rdparty/dlpack/include -isystem /z/Dev/relax/3rdparty/dmlc-core/include -isystem /z/Dev/relax/3rdparty/rang/include -isystem /z/Dev/relax/3rdparty/compiler-rt -isystem /z/Dev/relax/3rdparty/picojson -isystem /usr/local/cuda/include -std=c++17 -faligned-new -O2 -Wall -fPIC -std=c++17 -O2 -g -DNDEBUG -fPIC -ffile-prefix-map=..=/z/Dev/relax -MD -MT tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -MF CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o.d -o CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -c /z/Dev/relax/src/runtime/contrib/sort/sort.cc
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=95,96,97,98,99,100 "Built target tvm_runtime_objs"
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/depend
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/relax /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tvm /z/Dev/mlc-llm/build/tvm/CMakeFiles/tvm_runtime.dir/DependInfo.cmake --color=
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/build
make[2]: *** No rule to make target 'tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/builtin_fp16.cc.o', needed by 'tvm/libtvm_runtime.so'. Stop.
make[1]: *** [CMakeFiles/Makefile2:438: tvm/CMakeFiles/tvm_runtime.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
mlc-llm commit id: 909f267 relax commit id: 227cacd
There are a couple of things here
- Their model uses a vulkan target, so you can’t use their HF weights.
- I believe the CLI app targets Vulkan as well, so you have to test with the chat python file in the tests folder.
- But there is no TVM profiling info for the CUDA path, so your built model is gonna be much slower than their vulkan release.
Hi @zhaoyang-star , would you mind checking whether you have cloned the submodules (if 3rdparty/tokenizers is empty, then submodules are not cloned properly), if not, please update submodules via:
git submodules update --init --recursive
git submodules update --init --recursive
I am sure I have got all submodules before building. There is no file changed after running git submodule update --init --recursive under mlc-llm project. @yzh119
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse sentencepiece-js tokenizers-cpp
BTW, tokenizers-cpp is not a git submodule.
I use cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON && make to get debug info. The output:
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# make
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -P /z/Dev/mlc-llm/build/CMakeFiles/VerifyGlobs.cmake
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm -B/z/Dev/mlc-llm/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/build/CMakeFiles /z/Dev/mlc-llm/build//CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/build'
make -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_llm_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[ 0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DMLC_LLM_EXPORTS -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/sentencepiece-js/sentencepiece/src -I/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -std=c++17 -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -MF CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o.d -o CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -c /z/Dev/mlc-llm/cpp/llm_chat.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[ 0%] Built target mlc_llm_objs
make -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_cli_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[ 0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/argparse/include -std=c++17 -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -MF CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o.d -o CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -c /z/Dev/mlc-llm/cpp/cli_main.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[ 0%] Built target mlc_cli_objs
make -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tokenizers /z/Dev/mlc-llm/build/tokenizers/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[ 0%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/build/tokenizers RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:74: tokenizers/release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make[1]: *** [CMakeFiles/Makefile2:677: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/build'
make: *** [Makefile:159: all] Error 2
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse sentencepiece-js tokenizers-cpp
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# tree tokenizers-cpp/
tokenizers-cpp/
|-- CMakeLists.txt
|-- Cargo.toml
|-- src
| `-- lib.rs
`-- tokenizers.h
1 directory, 4 files
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty/tokenizers-cpp# ls /z/Dev/mlc-llm/build/tokenizers
CMakeFiles Makefile cmake_install.cmake
It seems error happened when compiling tokenizers-cpp. I have verified that he two paths /z/Dev/mlc-llm/3rdparty/tokenizers-cpp and /z/Dev/mlc-llm/build/tokenizers are accessable.
The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?
The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?
Yes, compilation of tokenizer-cpp depends on Rust. Rust is not installed on my device. I will install rust firstly. Thanks for your kind help.
Rust is installed now.
# which rustc
/usr/bin/rustc
# rustc --version
rustc 1.65.0
# cd build; cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON; make
The same error occured.
I tried to compile tokenizers-cpp alone. Also, error happened when generating release/libtokenizers_cpp.a. How you compile the tokenizers-cpp? @yzh119
# cd 3rdparty/tokenizers-cpp/
# mkdir build; cd build
# cmake .. .. -DCMAKE_VERBOSE_MAKEFILE=ON
# make
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -B/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build//CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
[100%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [CMakeFiles/tokenizers.dir/build.make:74: release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make[1]: *** [CMakeFiles/Makefile2:86: CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make: *** [Makefile:94: all] Error 2
@zhaoyang-star We rely on cargo: rust's package manager, not only rustc.
I finally build mlc_chat_cli and libmlc_llm.so after installing Rust dev environment.
It came error that libtvm_runtime.so has not compiled with cuda runtime when I ran mlc_chat_cli. I am sure USE_CUDA, USE_CUDNN, USE_CUBLAS and USE_LLVM are ON when compiling TVM. /z/Dev/relax/ is TVM_HOME env var. The unittest test_cudnn.py also passed.
@yzh119 Could you please have a look at it? Is there something I still missing? Thanks a lot.
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ls
CMakeCache.txt CPackConfig.cmake Makefile cmake_install.cmake libmlc_llm.so sentencepiece tvm
CMakeFiles CPackSourceConfig.cmake TVMBuildOptions.txt libmlc_llm.a mlc_chat_cli tokenizers
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
[00:00:37] /z/Dev/relax/src/runtime/library_module.cc:126: Binary was created using {cuda} but a loader of that name is not registered. Available loaders are VMExecutable, relax.Executable, metadata, const_loader, metadata_module. Perhaps you need to recompile with this runtime enabled.
Stack trace:
[bt] (0) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x2c) [0x7fa7cbce2efc]
[bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x45) [0x55d806d97d61]
[bt] (2) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::LoadModuleFromBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, dmlc::Stream*)+0x3d3) [0x7fa7cbce0013]
[bt] (3) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ProcessModuleBlob(char const*, tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>, tvm::runtime::Module*, tvm::runtime::ModuleNode**)+0x590) [0x7fa7cbce06a0]
[bt] (4) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::CreateModuleFromLibrary(tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>)+0x221) [0x7fa7cbce1601]
[bt] (5) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(+0xcd71f) [0x7fa7cbccd71f]
[bt] (6) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Module::LoadFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x20e) [0x7fa7cbcec26e]
[bt] (7) ./mlc_chat_cli(+0x845c) [0x55d806d9945c]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa7cba32083]
You need to set USE_CUDA=ON when compiling mlc_llm
Thank @tqchen for your kind help. Env:
- NV T4
- Tune off
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/ --evaluate
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:749: logits[:10] =[-7.34375, -6.17969, 5.78125, -1.62012, -3.20312, -2.6543, -0.955566, -4.88672, -4.14844, -1.96777]
[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:754: encoding-time=527.079ms, decoding-time=36.8855ms.
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
USER: Who is Lionel Messi?
ASSISTANT: Lionel Messi is a professional soccer player who was born on June 24, 1987, in Rosario, Argentina. He is widely considered to be one of the greatest soccer players of all time. Messi grew up in a family of soccer players and began playing the sport at a young age. He eventually joined the youth academy of Spanish club Barcelona, where he made his professional debut at the age of 17. Since then, Messi has established himself as one of the most talented and skilled players in the world, winning numerous accolades and helping Barcelona to numerous championships. He is known for his exceptional speed, agility, and ball control, as well as his ability to score goals and create opportunities for his teammates. Off the field, Messi is known for his charitable efforts and his commitment to promoting the sport of soccer in his home country of Argentina.
USER: /stats
encode: 60.9 tok/s, decode: 21.1 tok/s
OMG it takes a lot to build the project ok, so why there is no readme???
:(
You need to set USE_CUDA=ON when compiling
mlc_llm
What if I don't have a cuda device on my computer? Thanks.
Please check out this page for building mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html
Please check out this page for building
mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html
Page not found
The cli is now deprecated in the new version, checkout https://llm.mlc.ai/docs/deploy/cli.html for latest instruction