DragonFive issues

Repositories
Issues
Comments

Results 2 issues of


                                            DragonFive

how to do the interpolation?

Thank you for you source code for attention-transfer, but I an not familiar with pytorch. I do not understand the interplotation's implementation. How it works? how to do the interpolation...

set chunked_prefill off when use mla

FIX #13370 (*link existing issues this PR will resolve*) in vllm/config.py , it will forcing chunked prefill and prefix caching to be disabled, but it's too late, the max_num_batched_tokens will...