mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

how to build mlc-llm-cli on Linux

Open zhaoyang-star opened this issue 2 years ago • 13 comments

I want to run vicuna-7b on nv gpu based on mlc-llm. I followed the intruction and have some changes:

  1. Install relax.

    git clone https://github.com/mlc-ai/relax.git --recursive
    cd relax
    mkdir build
    cp cmake/config.cmake build
    

    in build/config.cmake, set USE_CUDA, USE_CUDNN, USE_CUBLAS and USE_LLVM as ON

    cmake ..
    make -j
    export TVM_HOME=/path/to/relax
    export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python
    
  2. Get Model Weight I just use the following:

    git lfs install
    git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b
    mkdir -p dist/models
    ln -s path/to/vicuna-v1-7b dist/models/vicuna-v1-7b
    

But there is no config.json and some other necessary files in vicuna-v1-7b path! They are transformed by transform_params. We need vicuna-v1-7b Pytorch format.

$ tree dist/
dist/
└── models
    └── vicuna-v1-7b
        ├── float16
        │   ├── ndarray-cache.json
        │   ├── tokenizer.model
        │   ├── params_shard_0.bin
        │   ├── params_shard_100.bin
        │   ├── params_shard_101.bin
        │   ├── params_shard_102.bin
        │   ├── params_shard_103.bin
        │   ├── params_shard_104.bin
        │   ├── params_shard_105.bin
...

I found that there is a vicuna-v1-7b with huggingface format whose dtype is float16 and vocab_size is 32001. So I downloaded this one.

  1. build model to library
    git clone https://github.com/mlc-ai/mlc-llm.git --recursive
    cd mlc-llm
    # change vocab_size=32001 in llma.py
    python3 build.py --model vicuna-v1-7b --dtype float16 --target cuda --max-seq-len 768 --artifact-path ../dist/   
    
    The output:
...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:10<00:00,  1.39it/s]
Total param size: 3.9229860305786133 GB
Start storing to cache ../dist/vicuna-v1-7b/float16/params
[0745/0745] saving param_744
All finished, 132 total shards committed, record saved to ../dist/vicuna-v1-7b/float16/params/ndarray-cache.json
Save a cached module to ../dist/vicuna-v1-7b/float16/mod_cache_before_build_float16.pkl.
20 static functions: [I.GlobalVar("rotary_embedding1"), I.GlobalVar("fused_decode1_fused_matmul5_multiply"), I.GlobalVar("decode4"), I.GlobalVar("slice1"), I.GlobalVar("squeeze"), I.GlobalVar("fused_decode_matmul3"), I.GlobalVar("fused_decode_fused_matmul3_add"), I.GlobalVar("fused_decode2_fused_matmul6_add"), I.GlobalVar("decode6"), I.GlobalVar("fused_transpose4_reshape4"), I.GlobalVar("transpose2"), I.GlobalVar("rms_norm1"), I.GlobalVar("fused_decode1_fused_matmul5_silu"), I.GlobalVar("reshape1"), I.GlobalVar("decode5"), I.GlobalVar("reshape"), I.GlobalVar("fused_reshape2_squeeze"), I.GlobalVar("reshape2"), I.GlobalVar("take_decode"), I.GlobalVar("fused_decode3_fused_matmul7_cast2")]
26 dynamic functions: [I.GlobalVar("fused_NT_matmul1_add1"), I.GlobalVar("extend_te"), I.GlobalVar("full"), I.GlobalVar("reshape3"), I.GlobalVar("reshape5"), I.GlobalVar("rotary_embedding"), I.GlobalVar("take_decode1"), I.GlobalVar("fused_NT_matmul_divide_maximum_minimum_cast"), I.GlobalVar("NT_matmul1"), I.GlobalVar("fused_softmax1_cast4"), I.GlobalVar("fused_NT_matmul3_silu1"), I.GlobalVar("fused_NT_matmul2_divide1_maximum1_minimum1_cast3"), I.GlobalVar("matmul8"), I.GlobalVar("transpose5"), I.GlobalVar("fused_softmax_cast1"), I.GlobalVar("fused_min_max_triu_te_broadcast_to"), I.GlobalVar("reshape7"), I.GlobalVar("reshape8"), I.GlobalVar("slice"), I.GlobalVar("fused_NT_matmul3_multiply1"), I.GlobalVar("squeeze1"), I.GlobalVar("matmul4"), I.GlobalVar("transpose3"), I.GlobalVar("rms_norm"), I.GlobalVar("fused_NT_matmul4_add1"), I.GlobalVar("reshape6")]
Dump static shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_static.py
Dump dynamic shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_dynamic.py
- Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_multiply
- Dispatch to pre-scheduled op: decode4
- Dispatch to pre-scheduled op: fused_NT_matmul1_add1
- Dispatch to pre-scheduled op: decode5
- Dispatch to pre-scheduled op: NT_matmul1
- Dispatch to pre-scheduled op: fused_NT_matmul_divide_maximum_minimum_cast
- Dispatch to pre-scheduled op: fused_softmax1_cast4
- Dispatch to pre-scheduled op: fused_NT_matmul3_silu1
- Dispatch to pre-scheduled op: fused_NT_matmul2_divide1_maximum1_minimum1_cast3
- Dispatch to pre-scheduled op: fused_decode_matmul3
- Dispatch to pre-scheduled op: matmul8
- Dispatch to pre-scheduled op: fused_softmax_cast1
- Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
- Dispatch to pre-scheduled op: fused_decode_fused_matmul3_add
- Dispatch to pre-scheduled op: fused_decode2_fused_matmul6_add
- Dispatch to pre-scheduled op: decode6
- Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_silu
- Dispatch to pre-scheduled op: fused_NT_matmul3_multiply1
- Dispatch to pre-scheduled op: matmul4
- Dispatch to pre-scheduled op: rms_norm
- Dispatch to pre-scheduled op: fused_NT_matmul4_add1
Finish exporting to ../dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
  1. Prepare lib and params There is instruction for ios as following. After Step 3, params and vicuna-v1-7b_cuda_float16.so are under path ../dist/vicuna-v1-7b/float16/params

    cd ios
    ./prepare_libs.sh
    ./prepare_params.sh
    
  2. Build mlc-llm-cli Is there any introduction on how to build mlc-llm-cli? I use the following but got error. @MasterJH5574 Could you please give some advice? Thanks a lot^_^

    mkdir build && cd build
    cmake .. && make
    [  0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o
    [  0%] Built target mlc_llm_objs
    [  0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o
    [  0%] Built target mlc_cli_objs
    [  0%] Generating release/libtokenizers_cpp.a
    No such file or directory
    make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:71: tokenizers/release/libtokenizers_cpp.a] Error 1
    make[1]: *** [CMakeFiles/Makefile2:674: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2
    make: *** [Makefile:156: all] Error 2
    

Run make -n to get debug info:

make -n
...
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --green --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=100 "Building CXX object tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o"
cd /z/Dev/mlc-llm/build/tvm && /usr/bin/c++ -DDMLC_USE_FOPEN64=0 -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DNDEBUG -DNDEBUG=1 -DTVM_INDEX_DEFAULT_I64=1 -DTVM_THREADPOOL_USE_OPENMP=0 -DTVM_USE_LIBBACKTRACE=0 -DUSE_FALLBACK_STL_MAP=0 -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/libcrc/include -isystem /z/Dev/relax/3rdparty/dlpack/include -isystem /z/Dev/relax/3rdparty/dmlc-core/include -isystem /z/Dev/relax/3rdparty/rang/include -isystem /z/Dev/relax/3rdparty/compiler-rt -isystem /z/Dev/relax/3rdparty/picojson -isystem /usr/local/cuda/include -std=c++17 -faligned-new -O2 -Wall -fPIC -std=c++17  -O2 -g -DNDEBUG -fPIC -ffile-prefix-map=..=/z/Dev/relax -MD -MT tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -MF CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o.d -o CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -c /z/Dev/relax/src/runtime/contrib/sort/sort.cc
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=95,96,97,98,99,100 "Built target tvm_runtime_objs"
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/depend
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/relax /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tvm /z/Dev/mlc-llm/build/tvm/CMakeFiles/tvm_runtime.dir/DependInfo.cmake --color=
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/build
make[2]: *** No rule to make target 'tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/builtin_fp16.cc.o', needed by 'tvm/libtvm_runtime.so'.  Stop.
make[1]: *** [CMakeFiles/Makefile2:438: tvm/CMakeFiles/tvm_runtime.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

mlc-llm commit id: 909f267 relax commit id: 227cacd

zhaoyang-star avatar May 10 '23 08:05 zhaoyang-star

There are a couple of things here

  • Their model uses a vulkan target, so you can’t use their HF weights.
  • I believe the CLI app targets Vulkan as well, so you have to test with the chat python file in the tests folder.
  • But there is no TVM profiling info for the CUDA path, so your built model is gonna be much slower than their vulkan release.

AlphaAtlas avatar May 11 '23 14:05 AlphaAtlas

Hi @zhaoyang-star , would you mind checking whether you have cloned the submodules (if 3rdparty/tokenizers is empty, then submodules are not cloned properly), if not, please update submodules via:

git submodules update --init --recursive

yzh119 avatar May 11 '23 19:05 yzh119

git submodules update --init --recursive

I am sure I have got all submodules before building. There is no file changed after running git submodule update --init --recursive under mlc-llm project. @yzh119

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse  sentencepiece-js  tokenizers-cpp

BTW, tokenizers-cpp is not a git submodule. image

I use cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON && make to get debug info. The output:

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# make 
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -P /z/Dev/mlc-llm/build/CMakeFiles/VerifyGlobs.cmake
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm -B/z/Dev/mlc-llm/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/build/CMakeFiles /z/Dev/mlc-llm/build//CMakeFiles/progress.marks
make  -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_llm_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DMLC_LLM_EXPORTS -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/sentencepiece-js/sentencepiece/src -I/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -std=c++17  -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -MF CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o.d -o CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -c /z/Dev/mlc-llm/cpp/llm_chat.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[  0%] Built target mlc_llm_objs
make  -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_cli_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/argparse/include -std=c++17  -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -MF CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o.d -o CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -c /z/Dev/mlc-llm/cpp/cli_main.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[  0%] Built target mlc_cli_objs
make  -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tokenizers /z/Dev/mlc-llm/build/tokenizers/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/build/tokenizers RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:74: tokenizers/release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make[1]: *** [CMakeFiles/Makefile2:677: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/build'
make: *** [Makefile:159: all] Error 2

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse  sentencepiece-js  tokenizers-cpp
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# tree tokenizers-cpp/
tokenizers-cpp/
|-- CMakeLists.txt
|-- Cargo.toml
|-- src
|   `-- lib.rs
`-- tokenizers.h

1 directory, 4 files
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty/tokenizers-cpp# ls /z/Dev/mlc-llm/build/tokenizers
CMakeFiles  Makefile  cmake_install.cmake

It seems error happened when compiling tokenizers-cpp. I have verified that he two paths /z/Dev/mlc-llm/3rdparty/tokenizers-cpp and /z/Dev/mlc-llm/build/tokenizers are accessable.

zhaoyang-star avatar May 12 '23 00:05 zhaoyang-star

The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?

yzh119 avatar May 12 '23 06:05 yzh119

The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?

Yes, compilation of tokenizer-cpp depends on Rust. Rust is not installed on my device. I will install rust firstly. Thanks for your kind help.

zhaoyang-star avatar May 12 '23 06:05 zhaoyang-star

Rust is installed now.

# which rustc
/usr/bin/rustc
# rustc --version
rustc 1.65.0
# cd build; cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON; make

The same error occured.

I tried to compile tokenizers-cpp alone. Also, error happened when generating release/libtokenizers_cpp.a. How you compile the tokenizers-cpp? @yzh119

# cd 3rdparty/tokenizers-cpp/
# mkdir build; cd build
# cmake .. .. -DCMAKE_VERBOSE_MAKEFILE=ON
# make
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -B/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build//CMakeFiles/progress.marks
make  -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make  -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make  -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
[100%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [CMakeFiles/tokenizers.dir/build.make:74: release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make[1]: *** [CMakeFiles/Makefile2:86: CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make: *** [Makefile:94: all] Error 2

zhaoyang-star avatar May 12 '23 06:05 zhaoyang-star

@zhaoyang-star We rely on cargo: rust's package manager, not only rustc.

yzh119 avatar May 12 '23 07:05 yzh119

I finally build mlc_chat_cli and libmlc_llm.so after installing Rust dev environment.

It came error that libtvm_runtime.so has not compiled with cuda runtime when I ran mlc_chat_cli. I am sure USE_CUDA, USE_CUDNN, USE_CUBLAS and USE_LLVM are ON when compiling TVM. /z/Dev/relax/ is TVM_HOME env var. The unittest test_cudnn.py also passed.

@yzh119 Could you please have a look at it? Is there something I still missing? Thanks a lot.

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ls
CMakeCache.txt  CPackConfig.cmake        Makefile             cmake_install.cmake  libmlc_llm.so  sentencepiece  tvm
CMakeFiles      CPackSourceConfig.cmake  TVMBuildOptions.txt  libmlc_llm.a         mlc_chat_cli   tokenizers
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
[00:00:37] /z/Dev/relax/src/runtime/library_module.cc:126: Binary was created using {cuda} but a loader of that name is not registered. Available loaders are VMExecutable, relax.Executable, metadata, const_loader, metadata_module. Perhaps you need to recompile with this runtime enabled.
Stack trace:
  [bt] (0) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x2c) [0x7fa7cbce2efc]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x45) [0x55d806d97d61]
  [bt] (2) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::LoadModuleFromBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, dmlc::Stream*)+0x3d3) [0x7fa7cbce0013]
  [bt] (3) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ProcessModuleBlob(char const*, tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>, tvm::runtime::Module*, tvm::runtime::ModuleNode**)+0x590) [0x7fa7cbce06a0]
  [bt] (4) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::CreateModuleFromLibrary(tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>)+0x221) [0x7fa7cbce1601]
  [bt] (5) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(+0xcd71f) [0x7fa7cbccd71f]
  [bt] (6) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Module::LoadFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x20e) [0x7fa7cbcec26e]
  [bt] (7) ./mlc_chat_cli(+0x845c) [0x55d806d9945c]
  [bt] (8) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa7cba32083]

zhaoyang-star avatar May 15 '23 08:05 zhaoyang-star

You need to set USE_CUDA=ON when compiling mlc_llm

tqchen avatar May 15 '23 14:05 tqchen

Thank @tqchen for your kind help. Env:

  • NV T4
  • Tune off
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/ --evaluate
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
  /help    print the special commands
  /exit    quit the cli
  /stats   print out the latest stats (token/sec)
  /reset   restart a fresh chat

[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:749: logits[:10] =[-7.34375, -6.17969, 5.78125, -1.62012, -3.20312, -2.6543, -0.955566, -4.88672, -4.14844, -1.96777]
[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:754: encoding-time=527.079ms, decoding-time=36.8855ms.
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
  /help    print the special commands
  /exit    quit the cli
  /stats   print out the latest stats (token/sec)
  /reset   restart a fresh chat

USER: Who is Lionel Messi?                    
ASSISTANT: Lionel Messi is a professional soccer player who was born on June 24, 1987, in Rosario, Argentina. He is widely considered to be one of the greatest soccer players of all time. Messi grew up in a family of soccer players and began playing the sport at a young age. He eventually joined the youth academy of Spanish club Barcelona, where he made his professional debut at the age of 17. Since then, Messi has established himself as one of the most talented and skilled players in the world, winning numerous accolades and helping Barcelona to numerous championships. He is known for his exceptional speed, agility, and ball control, as well as his ability to score goals and create opportunities for his teammates. Off the field, Messi is known for his charitable efforts and his commitment to promoting the sport of soccer in his home country of Argentina.
USER: /stats
encode: 60.9 tok/s, decode: 21.1 tok/s

zhaoyang-star avatar May 16 '23 03:05 zhaoyang-star

OMG it takes a lot to build the project ok, so why there is no readme???

sleepwalker2017 avatar May 16 '23 10:05 sleepwalker2017

:(

Poordeveloper avatar May 17 '23 09:05 Poordeveloper

You need to set USE_CUDA=ON when compiling mlc_llm

What if I don't have a cuda device on my computer? Thanks.

njuhang avatar May 23 '23 07:05 njuhang

Please check out this page for building mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html

junrushao avatar Jun 14 '23 04:06 junrushao

Please check out this page for building mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html

Page not found

Kuchiriel avatar Apr 03 '24 16:04 Kuchiriel

The cli is now deprecated in the new version, checkout https://llm.mlc.ai/docs/deploy/cli.html for latest instruction

tqchen avatar Apr 03 '24 17:04 tqchen