Nick Stogner comments

Results 101 comments of


                                            Nick Stogner

CRD Validation Logic Needed

This should probably be caught in the controller, not a webhook. The controller should report a status indicated that the Model is not training yet b/c its waiting for a...

Delete node pools for Nodes in an unknown state for too long

Ahh, this is from so long ago that I cant remember. We can close for now. I will reopen if needed.

Load balancing doesn't seem to spread evenly

The only thing I can think of is that we would need to make sure these logs gathered for a time window in which all backend Pods have been serving....

Fix CPU vLLM image

Note, CPU-only support in `v0.7.2` also appears to be broken. ```bash $ git checkout v0.7.2 $ docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g . $ docker run -it --rm --network=host...

Fix CPU vLLM image

This actually appears to be an OOM issue where the error is not shown and the process does not crash (appears that a thread within vLLM might crash).

issue with parsing model from json when using multiple / in the path

Can you get the benchmark to log http requests?

issue with parsing model from json when using multiple / in the path

The error from above indicates a 400 but the curl is mentioning a 301

Ability to provide chat templates to vLLM

My primary question: Should chat templates be configured at the system level and referenced from Models or specified directly in Model specs?

Ability to provide chat templates to vLLM

Closing in favor of tracking in #410 which discusses a unified approach across vLLM and Ollama

How to specify GPU type in the Model template?

You should be covered by configuring a new resource profile: ```yaml # Example helm values file resourceProfiles: H100: limits: nvidia.com/gpu: "1" requests: nvidia.com/gpu: "1" nodeSelector: cloud.google.com/gke-accelerator: nvidia-h100-80gb ``` ...then configuring...