mlc-llm [Bug] Gemma 2 models fail due to errors in tokenizer

[Bug] Gemma 2 models fail due to errors in tokenizer

Open julioasotodv opened this issue 8 months ago • 0 comments

🐛 Bug

It looks like all supported Gemma 2 models are failing right now.

To Reproduce

from mlc_llm import MLCEngine

# Create engine
model = "HF://mlc-ai/gemma-2-2b-it-q4f16_1-MLC"
engine = MLCEngine(model)

Fails with:

InternalError: Traceback (most recent call last):
  2: operator()
        at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:459
  1: mlc::llm::Tokenizer::FromPath(tvm::runtime::String const&, std::optional<mlc::llm::TokenizerInfo>)
        at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:140
  0: mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::runtime::String const&)
        at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:210
  File "/workspace/mlc-llm/cpp/tokenizers/tokenizers.cc", line 210
InternalError: Check failed: (err.empty()) is false: Failed to parse JSON: syntax error at line 1 near: version https://git-lfs.github.com/spec/v1

Expected behavior

Model should be able to load correctly, without errors.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): all platforms (tested CPU and CUDA)
Operating system (e.g. Ubuntu/Windows/MacOS/...): Linux and Windows
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): desktop
How you installed MLC-LLM (conda, source): pip
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.11
GPU driver version (if applicable): any
CUDA/cuDNN version (if applicable): any
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): not relevant
Any other relevant information: None

Thank you!

Feb 17 '25 18:02 julioasotodv

mlc-llm mlc-llm copied to clipboard

[Bug] Gemma 2 models fail due to errors in tokenizer

🐛 Bug

To Reproduce

Expected behavior

Environment

mlc-llm
mlc-llm copied to clipboard