Pratyush Patel comments

Results 5 comments of


                                            Pratyush Patel

CPU utilization in microservices-v2021 traces

Makes sense, thank you!

Benchmarking MII performance

Thank you for all the pointers! 1. I did try passing `batch_size` to `generator.query` before; however, it results in this error (for GPT-NeoX-20b): ``` Exception calling application: Pipeline with tokenizer...

20B pretrained model inference OOM on 8xA100 40GB

Thanks @satpalsr! DeepSpeed MII worked for me (with just 2 GPUs). I would like to ask a follow-up question to understand this a little bit better. Based on the [DeepSpeed...

NVSwitch power

Could you please let me know which GPUs it is supported on? Also, how would I obtain the power reading? (Q2)

[Feature] DeepSeek V3 optimization

I had another question regarding DP attention. The [sglang blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models) mentions that DP attention is effective because of the MLA has only 1 KV head, which causes unnecessary duplication of...