Nicolas Patry

Results 977 comments of Nicolas Patry

This is not up to a single TGI deployment to declare it's served_model_name, you should do that at the aggregation level (during aggregation on your prometheus probes start gathering information...

Hi @aW3st, Thanks a lot for the PR. We're unlikely to adhere to anything `beta` from OpenAI, or even everything that OpenAI supports. The OpenAI adaptation layer exists for simple,...

What's the benefit over the Ipex backend ? If this allows suboptimal deployments compared to the IPEX image, I think we'd rather not merge this at all (and error out...

> The change I propose in this PR is being done with above background in mind. It introduces "xccl" distributed support into TGI which can be tried out if someone...

@jagzmz it's probably linked to the listener not being on the main thread. I'm not sure how to fix it easily. Also please, pin `rdev` to a specific revision in...

Fixed in : https://github.com/huggingface/safetensors/pull/554

Yes, some subdependencies broke semver which makes `cargo install` fail. Cargo install will always attempt to use latest patches if they exist which would break because of the semver breaking...

I think other projects are maintaining their own: https://pkg.go.dev/github.com/gomlx/tokenizers#section-readme We are currently not going to support due to the low amount of demand (compared to Python)

Load the file, modify the tensors, resave the file. This operation is destructive (you have less data than before) therefore doing a full rewrite of the file is necessary (there...

I'm not super familiar with vLLM, recent work, but if it's anything like TGI which is very similar, it will attempt to use ALL possible memory when loading up. Therefore...