jon-chuang
jon-chuang
https://github.com/triton-lang/triton/blob/f4c48a9233957903e30474bae6443bf3d3a79bf7/python/triton/runtime/autotuner.py#L346 Should be `args['x_size']` i.e. index by arg name not arg pos
### Bug description See: https://github.com/modularml/mojo/pull/3250#issuecomment-2229135283 This is a minor issue but invites controversy. It would be good to fix the root cause, and it also indicates a perf problem with...
### 🐛 Describe the bug https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_swiglu.py#L46 https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_geglu.py#L38 1e0 for fp32 and 1e4 for bf16 Seems a little excessive If kernels don't cause models to diverge; this test ought to pass...
### 🚀 The feature, motivation and pitch From new Flash Infer Release https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.1.4 cc @comaniac ### Additional context Follow up to: https://github.com/vllm-project/vllm/pull/7208, https://github.com/vllm-project/vllm/pull/7185