Sam Stoelinga comments

Results 223 comments of


                                            Sam Stoelinga

Proposal: Mount a PVC in ReadManyOnly mode as model storage

Storing models on a PVC is now supported with vLLM. Please update your helm chart to v0.10.0 or later to try it out. Other engines may happen later. Keeping this...

use "true" and "false" instead of 0 and 1

I updated the existing test and renamed the test so it's being run correctly now.

use "true" and "false" instead of 0 and 1

It's still relevant. @muyangyuapple encountered the same issue on pathways. But feel free to discard this and make a separate PR. tl;dr we should always true or false instead of...

Support setting node auto-provisioning cpu and memory parameters

Hitting a similar issue #603

Llama3.2 Vision Model: Guides and Issues

I am getting the following error: ``` ERROR 09-28 19:27:59 async_llm_engine.py:61] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF,...

Is there a way to have performance metrics when running kubeai?

Here is an example pod scraping metric that you can use with Google Managed Prometheus: ``` apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: vllm-pods spec: selector: matchLabels: app.kubernetes.io/name: vllm endpoints: -...

Is there a way to have performance metrics when running kubeai?

We should include the following grafana dashboard: https://github.com/vllm-project/vllm/tree/main/examples/production_monitoring

add Fuji v3 405b and solve HBM OOMs for larger models

@kelvin-zou @hanzhi713 would appreciate your review to make sure this PR roughly matches 405B. Thank you!

add Fuji v3 405b and solve HBM OOMs for larger models

Getting this error: ``` NotFoundError: The specified path gs://axlearn-public/tensorflow_datasets/tokenizers/sentencepiece/bpe _128k_c4.model was not found. ``` am I doing something wrong or is there a missing tokenizer? ``` gsutil ls -r -l...

add Fuji v3 405b and solve HBM OOMs for larger models

Fixed the issue after vocab model was uploaded. Now I'm hitting OOM issues. Here is the model config: ``` max_step: 3932160 mesh_axis_names[0]: 'pipeline' mesh_axis_names[1]: 'data' mesh_axis_names[2]: 'expert' mesh_axis_names[3]: 'fsdp' mesh_axis_names[4]:...