PGFLMG

Results 61 comments of PGFLMG

I see, this modification help some activation kernel use PDL? @Edenzzzz

> btw why is `CMAKE_BUILD_PARALLEL_LEVEL` not used for `make build`? It takes a very long time Need ninja

Looks good. Have you tested it on H100/B200? If not, I can help you add some H100/B200 tests.

H100 results, looks pretty good ``` rmsnorm-performance(head_dim=128): head_num token_num SGLang Turbomind 0 16.0 1.0 2.690041 2.643974 1 16.0 2.0 3.132268 2.748590 2 16.0 4.0 3.181508 2.758456 3 16.0 8.0 3.231008...

Have you ever prepare hidden states for offline training?

> 文档链接贴一下~ 好的 我贴一个rfcs:https://github.com/PaddlePaddle/community/pull/1026