text-embeddings-inference
text-embeddings-inference copied to clipboard
Add support for blackwell architecture (sm120)
What does this PR do?
This PR adds support for the Blackwell architecture, related to issue #652.
As I wanted to run TEI on my 5090 I went through a few iterations and got it working, tested with Qwen3-Embedding-0.6B.
Before submitting
- [X] Did you read the contributor guideline?
Yes
- [X] Was this discussed/approved via a GitHub issue or the forum?
Not discussed nor approved but it's a known issue and there is a related issue already opened at https://github.com/huggingface/text-embeddings-inference/issues/652
- [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Documentation updated to mention the new compute cap.
- [X] Did you write any new necessary tests? If applicable, did you include or update the
instasnapshots?
I have updated the only test already in place to validate the compute cap
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Hey @danielealbano, thanks for the changes, I'll take a look and test those over the week (as I just got back from a 2-week break), thanks for the contribution (and the patience) 🤗
@alvarobartt you can by the way support cuda 12.9 docker image, even if you system runs 12.2. Just add the following to the dockerfile.
RUN : Remove compat folder to avoid conflicts with host GPU drivers
RUN rm -rf /usr/local/cuda-12.9/compat
ENV NVIDIA_DISABLE_REQUIRE=true
I built the image using Dockerfile-cuda-blackwell(git revision: c406619) and it launched TEI successfully on my RTX 5090 with GPU processing working properly. Great work!
For those who want to quickly test TEI on Blackwell like I did, I've made the image available here (note: this is not actively maintained, but feel free to use it for testing):
- https://hub.docker.com/r/hotchpotch/tei-blackwell-testing