DragonFive
Results
2
issues of
DragonFive
Thank you for you source code for attention-transfer, but I an not familiar with pytorch. I do not understand the interplotation's implementation. How it works? how to do the interpolation...
FIX #13370 (*link existing issues this PR will resolve*) in vllm/config.py , it will forcing chunked prefill and prefix caching to be disabled, but it's too late, the max_num_batched_tokens will...