FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Will the cache kv become invalid?

Open oslijunw opened this issue 1 year ago • 0 comments

In a multi-threaded situation, if the GPU server resources are insufficient, will cache kv preemption occur? For example, there are two conversations at the same time, both of which are long. If the two conversations are halfway through and conversation a cuts into conversation b, the cache kv in conversation b should be lost, that is, the cache kv of conversation a is used. Due to the involvement of gpu computing and insufficient resources, verification cannot be carried out

oslijunw avatar Apr 16 '24 02:04 oslijunw