lukec

Results 3 issues of lukec

I found that there is a kenel for writing subsequent optimizations in rmsnorm, and I tried to write a half-precision kernel for rms. Below is the comparison data, I tested...

``` def test_simple(): code = """ #[version = "0.0.5"] def @simple[A, B](%fdd/ddsa: fn(A) -> B, %xs: A) -> B { %fdd/ddsa(%xs) } def @main(%l: Tensor[(5, 5), float32]) -> Tensor[(5, 5),...

type: bug
needs-triage

## Motivation Expert Parallelism (EP) Support for DeepSeek V3/R1。 ## Modifications * the group GEMM operator supports FP8 * supports DeepSeek V3 parameter loading. ## Performence The performance improved by...