mlc-llm
mlc-llm copied to clipboard
[Bug] Gemma 2 models fail due to errors in tokenizer
🐛 Bug
It looks like all supported Gemma 2 models are failing right now.
To Reproduce
from mlc_llm import MLCEngine
# Create engine
model = "HF://mlc-ai/gemma-2-2b-it-q4f16_1-MLC"
engine = MLCEngine(model)
Fails with:
InternalError: Traceback (most recent call last):
2: operator()
at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:459
1: mlc::llm::Tokenizer::FromPath(tvm::runtime::String const&, std::optional<mlc::llm::TokenizerInfo>)
at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:140
0: mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::runtime::String const&)
at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:210
File "/workspace/mlc-llm/cpp/tokenizers/tokenizers.cc", line 210
InternalError: Check failed: (err.empty()) is false: Failed to parse JSON: syntax error at line 1 near: version https://git-lfs.github.com/spec/v1
Expected behavior
Model should be able to load correctly, without errors.
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): all platforms (tested CPU and CUDA)
- Operating system (e.g. Ubuntu/Windows/MacOS/...): Linux and Windows
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): desktop
- How you installed MLC-LLM (
conda, source): pip - How you installed TVM-Unity (
pip, source): pip - Python version (e.g. 3.10): 3.11
- GPU driver version (if applicable): any
- CUDA/cuDNN version (if applicable): any
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): not relevant - Any other relevant information: None
Thank you!