Sam Stoelinga
Sam Stoelinga
Storing models on a PVC is now supported with vLLM. Please update your helm chart to v0.10.0 or later to try it out. Other engines may happen later. Keeping this...
I updated the existing test and renamed the test so it's being run correctly now.
It's still relevant. @muyangyuapple encountered the same issue on pathways. But feel free to discard this and make a separate PR. tl;dr we should always true or false instead of...
Hitting a similar issue #603
I am getting the following error: ``` ERROR 09-28 19:27:59 async_llm_engine.py:61] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF,...
Here is an example pod scraping metric that you can use with Google Managed Prometheus: ``` apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: vllm-pods spec: selector: matchLabels: app.kubernetes.io/name: vllm endpoints: -...
We should include the following grafana dashboard: https://github.com/vllm-project/vllm/tree/main/examples/production_monitoring
@kelvin-zou @hanzhi713 would appreciate your review to make sure this PR roughly matches 405B. Thank you!
Getting this error: ``` NotFoundError: The specified path gs://axlearn-public/tensorflow_datasets/tokenizers/sentencepiece/bpe _128k_c4.model was not found. ``` am I doing something wrong or is there a missing tokenizer? ``` gsutil ls -r -l...
Fixed the issue after vocab model was uploaded. Now I'm hitting OOM issues. Here is the model config: ``` max_step: 3932160 mesh_axis_names[0]: 'pipeline' mesh_axis_names[1]: 'data' mesh_axis_names[2]: 'expert' mesh_axis_names[3]: 'fsdp' mesh_axis_names[4]:...