v6d
v6d copied to clipboard
improve the benchmark test of vineyard llm kv cache
What do these changes do?
After the benchmark test, we can get the following result.
Token list size is 17792Total Update time is 2.22029s Total Query time is 0.646123s Average update time is 8013.38token/s Average query time is 27536.5token/s
The query time including (query kv tensor ptr from vineyard) + (memcpy from the kv tensor ptr to users' buffer)
/cc @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.