Pratyush Patel
Pratyush Patel
Makes sense, thank you!
Thank you for all the pointers! 1. I did try passing `batch_size` to `generator.query` before; however, it results in this error (for GPT-NeoX-20b): ``` Exception calling application: Pipeline with tokenizer...
Thanks @satpalsr! DeepSpeed MII worked for me (with just 2 GPUs). I would like to ask a follow-up question to understand this a little bit better. Based on the [DeepSpeed...
Could you please let me know which GPUs it is supported on? Also, how would I obtain the power reading? (Q2)
I had another question regarding DP attention. The [sglang blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models) mentions that DP attention is effective because of the MLA has only 1 KV head, which causes unnecessary duplication of...