DefTruth

Results 256 comments of DefTruth

seems we have to disable chunked-prefill on V0 mode, V1 work fine with chunked-prefill, but V0 failed. ```bash ERROR 03-31 16:21:41 [engine.py:160] ValueError('Attempted to assign 5460 = 5460 multimodal tokens...

still have this warning message for torch 2.9.0: ```bash Skipping import of cpp extensions due to incompatible torch version 2.9.0+cu128 for torchao version 0.14.0 Please see GitHub issue #2919 for...

> How does this fare to unified attention? I believe UAA would be a better implementation of Ulysses attention for any sequence length. It offers zero padding, near-zero communication overhead,...

> Sounds like a good option to me. [@DefTruth](https://github.com/DefTruth) would you like to work on adding it? My pleasure. However, I've been quite busy lately—I'll submit the implementation of UAA...