PGFLMG
PGFLMG
I see, this modification help some activation kernel use PDL? @Edenzzzz
> btw why is `CMAKE_BUILD_PARALLEL_LEVEL` not used for `make build`? It takes a very long time Need ninja
Reply gemini review plz
I triggered the CI.
Looks good. Have you tested it on H100/B200? If not, I can help you add some H100/B200 tests.
H100 results, looks pretty good ``` rmsnorm-performance(head_dim=128): head_num token_num SGLang Turbomind 0 16.0 1.0 2.690041 2.643974 1 16.0 2.0 3.132268 2.748590 2 16.0 4.0 3.181508 2.758456 3 16.0 8.0 3.231008...
Thats true
Have you ever prepare hidden states for offline training?
cc: @yuanlehome
> 文档链接贴一下~ 好的 我贴一个rfcs:https://github.com/PaddlePaddle/community/pull/1026