dstack
dstack copied to clipboard
[Bug]: Enabling OpenAI-compatible mapping for restricted HuggingFace models leads to unexpected server error
Steps to reproduce
If I run a service with model mapping without specifying chat_template, dstack goes to huggingface to get the tokenizer info. But if a model is restricted (requires an agreement), then the request without auth fails. I ran into the problem when running mistralai/Mistral-7B-Instruct-v0.1 example from our docs:
type: service
image: ghcr.io/huggingface/text-generation-inference:latest
env:
- MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
commands:
- text-generation-launcher --port 8000 --trust-remote-code
port: 8000
resources:
gpu: 24GB
# Enable the OpenAI-compatible endpoint
model:
type: chat
name: mistralai/Mistral-7B-Instruct-v0.1
format: tgi
It worked before. Apparently, mistralai/Mistral-7B-Instruct-v0.1 started to ask for agreement.
Actual behaviour
Not sure how to get the restricted model info without auth. In any case, if it's not possible, there should be an appropriate error.
Expected behaviour
No response
dstack version
master
Server logs
File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/gateways/options.py", line 31, in get_tokenizer_config
raise ConfigurationError(f"Failed to get tokenizer info: {e}")
dstack._internal.core.errors.ConfigurationError: Failed to get tokenizer info: 401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/tokenizer_config.json
Additional information
No response