Sam Stoelinga

Results 223 comments of Sam Stoelinga

Facing the same when trying to build gh200 image for vllm 0.8.1. Did any of you figure out a workaround?

I was able to get it to work by building triton 3.2.x from source. Working image for GH200 for vllm 0.8.1: ``` substratusai/vllm-gh200:v0.8.1 ``` Example docker run: ``` docker run...

Yeah I did make changes to torch related code since I ran this: `python3 use_existing_torch.py`. Note my image seems to be having an issue with flash-infer not being installed. Still...

I ended up changing the approach and got 0.8.2 docker image working instead: https://github.com/vllm-project/vllm/issues/10459#issuecomment-2759853357

Do you have a specific reranker API, model and engine in mind? OpenAI doesn't provide a reranker API. I am not too familiar with reranking use case. So please provide...

Edit: The docs call out that Infinity adheres to Cohere API, which is great. I think maybe that's our shortest route to support reranker models. @michaelfeil what's the API that...

Seems Cohere is also supported in other solutions: https://docs.continue.dev/customize/model-types/reranking I think supporting a Reranker API would be great for fully local private code completion with continue.dev. We support all endpoints...

It's definitely on the roadmap and I think I have all the info I would need. Right now our focus is on adding support for PVC support and Prefix Cache...

Development hasn't started yet. We've been too busy with other things and this is a relatively larger feature. If you need this urgently please reach out to me on LinkedIn.

Thank you for sharing this! We've been thinking of introducing a ModelAlias so you can have multiple models backing the same model endpoint. Imagine the use case where you have...