text-generation-inference `HUGGING_FACE_HUB_TOKEN` not exported in Sagemaker entrypoint

System Info

AWS sagemaker 2.163.0
g5.12xlarge instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

import json
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.8.2"
)

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300
hf_api_token = 'hf_...'

# TGI config
config = {
  'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048),  # Max length of the generation (including input text)
  'HUGGING_FACE_HUB_TOKEN': json.dumps(hf_api_token)
}

# create HuggingFaceModel
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env=config
)

Expected behavior

To successfully serve a private model hosted on huggingface hub passing a HUGGING_FACE_HUB_TOKEN

Jun 20 '23 21:06 mspronesti

You can try uploading it to S3 and then deploying it following this blog post: https://www.philschmid.de/sagemaker-llm-vpc

Jun 21 '23 15:06 philschmid

@philschmid this is actually very helpful, thank you! However, why don't you also export HUGGING_FACE_HUB_TOKEN here so that one can also use serve a private model on the hub ?

Jun 22 '23 12:06 mspronesti

Hi @mspronesti

I am guessing, it seems that the launcher get the access token from a environment variable. Have you tried HF_API_TOKEN ?

https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L583-L586

# TGI config
config = {
  'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
  # ...
  'HF_API_TOKEN': json.dumps(hf_api_token)
}

Jul 08 '23 06:07 cirocavani

@cirocavani suggestion should also work! The reason why we created the VPC+S3 blog is to show how to do it when your sagemaker environment is not having internet access.

Jul 13 '23 12:07 philschmid

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jul 17 '24 01:07 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

`HUGGING_FACE_HUB_TOKEN` not exported in Sagemaker entrypoint

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard