Jian Chen issues

Results 4 issues of


                                            Jian Chen

apply_rope_inplace will cause graphbreak due to mutated inputs

```python import torch import flashinfer rope = flashinfer.apply_rope_inplace torch.library.define( "mylib::target_rope", "(Tensor(a!) q, Tensor(a!) k, Tensor indptr, Tensor offsets) -> None", ) @torch.library.impl("mylib::target_rope", "cuda") def target_rope(q, k, indptr, offsets): rope(q, k,...

Can we integrate Flashinfer into gpt-fast?

Hi, in previous issues, you wrote that you planed to integrate flashinfer into some inference backend like gpt-fast. This will be very interesting! And may I ask can we integrate...

flash-attn-with-kvcache has performance issue for torch 2.5.0

flash-attn version=2.6.3 torch version: 2.5.0 nightly ```python dec_len = 1 batch_size = 32 context_len = 16000 print(batch_size, dec_len, context_len) with torch.device("cuda"): q = torch.randn((batch_size, dec_len, 32, 128), dtype=torch.bfloat16) k_cache =...

[Bug] Missing specforge.modeling.target.sgl_model_wrapper import SglangTargetModel

### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest...