lorax
lorax copied to clipboard
Passing a `--revision` causes failure in loading tokenizer config
trafficstars
System Info
docker run ghcr.io/predibase/lorax:ea5d74b --gpus all -it --rm -e HF_TOKEN=... lorax-launcher --source hub --model-id meta-llama/Llama-2-7b-chat-hf --default-adapter-source local --revision f5db02db724555f92da89c216ac04704f23d4590
We notice
2024-08-01T13:25:45.718148Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-08-01T13:25:46.569450Z INFO lorax_router: router/src/main.rs:534: Serving revision f5db02db724555f92da89c216ac04704f23d4590 of model meta-llama/Llama-2-7b-chat-hf
2024-08-01T13:25:46.569516Z WARN lorax_router: router/src/main.rs:337: Could not find tokenizer config locally and no API specified
2024-08-01T13:25:46.569522Z WARN lorax_router: router/src/main.rs:358: Could not find a fast tokenizer implementation for meta-llama/Llama-2-7b-chat-hf
2024-08-01T13:25:46.569527Z WARN lorax_router: router/src/main.rs:359: Rust input length validation and truncation is disabled
and any chat completions requests also fail with template not found
When I skip passing revision, it manages to find the tokenizer_config.json
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
docker run ghcr.io/predibase/lorax:ea5d74b --gpus all -it --rm -e HF_TOKEN=... lorax-launcher --source hub --model-id meta-llama/Llama-2-7b-chat-hf --default-adapter-source local --revision f5db02db724555f92da89c216ac04704f23d4590
Expected behavior
Passing a revision should not affect loading tokenizer config