Edwin Hernandez issues

Results 4 issues of


                                            Edwin Hernandez

Add model_load_time metric

# What does this PR do? Adds metric that measures the time spent in downloading the model, loading into GPU memory, and time it takes for the server to be...

adding max_token_capacity metric

# What does this PR do? Emit max_batch_total_tokens as max_token_capacity as part of metrics standardization Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or...

Error: "POST /generate HTTP/1.1" 404 Not Found when running Locust tool against vLLM model server

Whenever running the locust benchmarking tool against a vLLM model server, I get this error ``` POST /generate HTTP/1.1" 404 Not Found ``` After some investigation, it looks like it...

benchmarks

Added integration with https://github.com/kubernetes-sigs/lws for TPUs, as well as integration of LWS + Pathways. To run basic LWS+TPU ``` axlearn gcp launch run --cluster=$CLUSTER \ --runner_name gke_tpu_lws \ --name=$USER \...

Edwin Hernandez

Add model_load_time metric

adding max_token_capacity metric

Error: "POST /generate HTTP/1.1" 404 Not Found when running Locust tool against vLLM model server

Adding LWS Integration