Edwin Hernandez
Edwin Hernandez
# What does this PR do? Adds metric that measures the time spent in downloading the model, loading into GPU memory, and time it takes for the server to be...
# What does this PR do? Emit max_batch_total_tokens as max_token_capacity as part of metrics standardization Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or...
Whenever running the locust benchmarking tool against a vLLM model server, I get this error ``` POST /generate HTTP/1.1" 404 Not Found ``` After some investigation, it looks like it...
Added integration with https://github.com/kubernetes-sigs/lws for TPUs, as well as integration of LWS + Pathways. To run basic LWS+TPU ``` axlearn gcp launch run --cluster=$CLUSTER \ --runner_name gke_tpu_lws \ --name=$USER \...