Yukun He
Yukun He
Hi @zongfeijing. I am currently working on refactoring the autotuning system for fused_moe at this movement (see this pull request: https://github.com/NVIDIA/TensorRT-LLM/pull/3151). I notice that you have many changes on the...
To simplify the nested tuning process, we want : * The inner op is not forced to have forward and get_valid_tactics to be implemented (whether it is a tunable one...
> Sure, I will try it. Thanks a lot for the effort. I have just pushed another commit to clean the code and make UT work. Because this is the...
Hi @Wong4j. Thanks a lot for the effort! I just moved the common code changes in AutoTuner to a standalone PR #9348 because it might be required by other tunable...
Looks like this bug also reflects some other issues associated with the distribution across ranks https://nvbugspro.nvidia.com/bug/5680133. Maybe you will have some ideas or comments on this @rosenrodt. Thanks a lot...
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot skip --comment "Pipeline has already been cleaned and only change the pre-commit configs."
/bot run --disable-fail-fast