sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

EETQ not available when using TGI via get_huggingface_llm_image_uri

Open TRT-BradleyB opened this issue 1 year ago • 4 comments

Describe the bug

Related to this issue: https://github.com/aws/deep-learning-containers/issues/3377

There are two versions of the TGI 1.1.0 image. One has EETQ pre-installed: https://github.com/NetEase-FuXi/EETQ

py39-cu118-ubuntu20.04 and py39-cu118-ubuntu20.04-v1.0

In the json config only the one without EETQ is specified.

https://github.com/aws/sagemaker-python-sdk/blob/bfc63d2bb91e33345651e3e00598772b7fb9f971/src/sagemaker/image_uri_config/huggingface-llm.json#L220

Easy fix, but I'm not sure how you'd like to resolve this given the naming scheme deviates.

TRT-BradleyB avatar Oct 15 '23 16:10 TRT-BradleyB

They have multiple versions, but none are working with AWQ models:

    "imageDetails": [
        {
            "registryId": "763104351884",
            "repositoryName": "huggingface-pytorch-tgi-inference",
            "imageDigest": "sha256:2739b630b95d8a95e6b4665e66d8243dd43b99c4fdb865feff13aab9c1da06eb",
            "imageTags": [
                "2.0.1-gpu-py39-cu118-ubuntu20.04",
                "2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04",
                "2.0-gpu-py39-cu118-ubuntu20.04-v1",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0-2023-10-02-14-29-28",
                "2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04-v1",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0"
            ],
            "imageSizeInBytes": 4576429231,
            "imagePushedAt": "2023-10-02T16:39:34+02:00",
            "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "artifactMediaType": "application/vnd.docker.container.image.v1+json",
            "lastRecordedPullTime": "2023-10-16T15:46:30.296000+02:00"
        }
    ]
}```

Daan-Grashoff avatar Oct 16 '23 17:10 Daan-Grashoff

Why is this still not solved ? eetq slashes inference time by a factor of 2...

Igosuki avatar Nov 13 '23 17:11 Igosuki

I might be missing something obvious, but the two tags you listed for 1.1.0 should be pointing to the same image. Please use the latest version, which should be 1.3.3 as of this writing.

amzn-choeric avatar Jan 10 '24 17:01 amzn-choeric

@TRT-BradleyB can you try with latest TGI image?

knikure avatar Jan 10 '24 17:01 knikure