Bowen Wang

Results 26 comments of Bowen Wang

> A couple things: > > 1. Can we extract Hierarchical Load Balancing and Global Load Balancing to a config flag? > 2. The deepseek v3 technical paper (https://arxiv.org/pdf/2412.19437) introduced...

> Will the load statistics here bring additional performance consumption? Is there a better optimization method? I tested the PyTorch implementation, and the overhead is only about 1/5 of the...

> Hello! That's a great work, indeed! Thank you for this! There are issues in your implementation, however: > > 1. While moving the expert load statistics gathering algorithm from...

> Hi @abmfy 👋, > > First, thank you for your clever algorithm design and continuous contributions! 🙌 I have two questions about the implementation: > > 1️⃣ EPLB Rearrangement...

> In my test, it takes seconds to run eplb algorithm. When each GPUModelRunner runns into EplbState.rearrange, why not let each to compute several layers by calling rebalance_experts and all...

> In my test, it takes seconds to run eplb algorithm. When each GPUModelRunner runns into EplbState.rearrange, why not let each to compute several layers by calling rebalance_experts and all...

> As for running several layers of eplb on each rank, your implementation is already very well. > > In my test, eplb algorithm indeed generate duplicate expert ids on...

> Also, maybe not in this PR, but it'd be nice if we can group the eplb-related configs (or ep-related ones) into a separate config. We did it for `compilation_config`...

> @abmfy Oh actually, before we merge this PR, can we have a (unit) test? @WoosukKwon Sure, I've added two unit tests `eplb-algorithm-test` and` eplb-execution-test` and they're passing; the failing...

> 🎉 So happy to see this PR finally merged after going through so many challenges — big round of applause for the researcher's persistence and dedication! @abmfy 👏👏👏 >...