mlc-llm
mlc-llm copied to clipboard
Universal LLM Deployment Engine with ML Compilation
Output from running `mlc_chat_cli`: ```(mlc-chat) 2024mgagvani@snowy:~$ mlc_chat_cli WARNING: lavapipe is not a conformant vulkan implementation, testing use only. Use lib /cluster/2024mgagvani/dist/lib/vicuna-v1-7b_vulkan_float16.so Initializing the chat module... Finish loading You can use...
my environment is win10 WSL2 Ubuntu22.04, and I follow the command: `conda create -n mlc-chat` `conda activate mlc-chat` `conda install git git-lfs` `conda install -c mlc-ai -c conda-forge mlc-chat-nightly` `mkdir...
win10 x64 8G RAM NVIDIA GeForce 940MX after mlc_chat_cli command: Use lib E:\Code\test\mlc-chat\dist\lib\vicuna-v1-7b_vulkan_float16.dll Initializing the chat module... [16:56:46] D:\a\utils\utils\tvm\src\runtime\vulkan\vulkan_buffer.cc:61: --------------------------------------------------------------- An error occurred during the execution of TVM. For more...
# The issue Currently, our tokenizer.cpp port only supports load from [a single json file](https://github.com/mlc-ai/mlc-llm/blob/5bdcc86a632c7105ac2b874d7d255685839dd204/3rdparty/tokenizers-cpp/tokenizers.h#L81-L84), which is the [legacy format](https://huggingface.co/docs/transformers/v4.28.1/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.save_pretrained) of hugging face tokenizer that is only applicable to fast...
Thanks for your effort. Do you plan add an API layer on top of this, to use your layer as a local API layer ? In my scenario I'd like...
Please, consider adding DockerFIle and docker-compose to the repository
It seems like the tuning is per device, although the m1 tuning is applied when using any GPU. How would I use relax_integration.tune_relax on `mod_deploy` to create other databases? I...