Nathan Price issues

Results 17 issues of


                                            Nathan Price

Added documentation of using warmups to initialize lora weights

This PR provides documentation for converting lora adapters from a hugging face checkpoint into a warmup that can be used in the triton-inference-server TensorRT-LLM backend. This approach allows for the...

Exception when disabling "inflight_fused_batching"

### System Info Debian 11 `nvidia-smi` ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC...

bug

Cannot use `await request.body()` in label

I would like to have labels applied which will be populated from the content of the request body. tried something like: ``` async def get_label_value(request:Request): return request.json().get("label", None) app.add_middleware( PrometheusMiddleware,...

Nathan Price

Added documentation of using warmups to initialize lora weights

Exception when disabling "inflight_fused_batching"

Cannot use `await request.body()` in label

Enhancement/allow custom metric buckets

[Feature Request] Allow custom metric buckets

Add support for priority in vllm backend

ModelAdapters do not dynamically route to new pods