Philipp Moritz

Results 85 comments of Philipp Moritz

Even though it looks like it might be related, the tests are actually flaky on master :)

This PR is part of trying to improve the error messages (to the same standard as boto3) that are raised from Apache Arrow when interacting with S3: https://issues.apache.org/jira/browse/ARROW-17079

Basically somebody from Google hosting the M1 binaries of the docker puller I think :) And merging it of course. I'd love to see that happen :)

Same for `py3_image`, 0.22.0 works and 0.23.0 doesn't. Hopefully this can be fixed soon :)

@AdamSvetec If you have the patch for this available, this would be very useful for us too :)

Given that there is also https://github.com/vllm-project/vllm/pull/1507 for int8, it would be good to give a little thought to the convention going forward, here is a possibility: Set `--kv-cache-dtype=fp8_e5m2` for E5M2...

Hey Cathy + Ludwig, glad to hear from you! Scaling up sparse linear algebra on non-MPI systems is challenging, because each task is typically very small (in this case, it...

Very nice! I ran MMLU on mixtral with TP8 on this PR as an end-to-end check for the correctness and the results look good: ``` | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|...

Also here are a few latency measurements on mixtral, TP 8 in different batch size regimes: with this PR: ``` bs = 1: Avg ITL: 10.36 milliseconds bs = 2:...

Thanks a lot for putting together this RFC! This sounds like a solid plan to me. Some more detailed comments: - Why not store the KV scaling factors in the...