Nathan Price
Nathan Price
This PR provides documentation for converting lora adapters from a hugging face checkpoint into a warmup that can be used in the triton-inference-server TensorRT-LLM backend. This approach allows for the...
### System Info Debian 11 `nvidia-smi` ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC...
I would like to have labels applied which will be populated from the content of the request body. tried something like: ``` async def get_label_value(request:Request): return request.json().get("label", None) app.add_middleware( PrometheusMiddleware,...