lukec

Results 8 issues of lukec

I found that there is a kenel for writing subsequent optimizations in rmsnorm, and I tried to write a half-precision kernel for rms. Below is the comparison data, I tested...

``` def test_simple(): code = """ #[version = "0.0.5"] def @simple[A, B](%fdd/ddsa: fn(A) -> B, %xs: A) -> B { %fdd/ddsa(%xs) } def @main(%l: Tensor[(5, 5), float32]) -> Tensor[(5, 5),...

type: bug
needs-triage

## Motivation Expert Parallelism (EP) Support for DeepSeek V3/R1。 ## Modifications * the group GEMM operator supports FP8 * supports DeepSeek V3 parameter loading. ## Performence The performance improved by...

## Motivation Support qwen3's deepep. For now, we've simply copied the deepep code from DS, and the accuracy test has passed. ## TODO * test bf16 compatibility ## Test Command...

enhancement
help wanted
high priority

### Checklist - [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed. -...

### Checklist - [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed. -...