Nick Stogner

Results 101 comments of Nick Stogner

This should probably be caught in the controller, not a webhook. The controller should report a status indicated that the Model is not training yet b/c its waiting for a...

Ahh, this is from so long ago that I cant remember. We can close for now. I will reopen if needed.

The only thing I can think of is that we would need to make sure these logs gathered for a time window in which all backend Pods have been serving....

Note, CPU-only support in `v0.7.2` also appears to be broken. ```bash $ git checkout v0.7.2 $ docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g . $ docker run -it --rm --network=host...

This actually appears to be an OOM issue where the error is not shown and the process does not crash (appears that a thread within vLLM might crash).

Can you get the benchmark to log http requests?

The error from above indicates a 400 but the curl is mentioning a 301

My primary question: Should chat templates be configured at the system level and referenced from Models or specified directly in Model specs?

Closing in favor of tracking in #410 which discusses a unified approach across vLLM and Ollama

You should be covered by configuring a new resource profile: ```yaml # Example helm values file resourceProfiles: H100: limits: nvidia.com/gpu: "1" requests: nvidia.com/gpu: "1" nodeSelector: cloud.google.com/gke-accelerator: nvidia-h100-80gb ``` ...then configuring...