text-generation-inference
text-generation-inference copied to clipboard
`HUGGING_FACE_HUB_TOKEN` not exported in Sagemaker entrypoint
System Info
- AWS
sagemaker
2.163.0 - g5.12xlarge instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
import json
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri
# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
"huggingface",
version="0.8.2"
)
# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300
hf_api_token = 'hf_...'
# TGI config
config = {
'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)
'HUGGING_FACE_HUB_TOKEN': json.dumps(hf_api_token)
}
# create HuggingFaceModel
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
env=config
)
Expected behavior
To successfully serve a private model hosted on huggingface hub passing a HUGGING_FACE_HUB_TOKEN
You can try uploading it to S3 and then deploying it following this blog post: https://www.philschmid.de/sagemaker-llm-vpc
@philschmid this is actually very helpful, thank you! However, why don't you also export HUGGING_FACE_HUB_TOKEN
here so that one can also use serve a private model on the hub ?
Hi @mspronesti
I am guessing, it seems that the launcher get the access token from a environment variable. Have you tried HF_API_TOKEN
?
https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L583-L586
# TGI config
config = {
'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
# ...
'HF_API_TOKEN': json.dumps(hf_api_token)
}
@cirocavani suggestion should also work! The reason why we created the VPC+S3 blog is to show how to do it when your sagemaker environment is not having internet access.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.