sys_reading
sys_reading copied to clipboard
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
https://arxiv.org/pdf/2401.02669.pdf