`useCache` on completion not working correctly
In the main chat example, the cache does not appear to work properly. Every second message the cache gets reset. (nKeep=0)
I have tried making a very simple reproduction of sending multiple calls to createCompletion and also logging the token sequence in computeNonCachedTokens and the issue persists even though the tokens are following each other correctly and should be cached.
Example log:
seq (14) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761]
Cache nKeep=0
seq (24) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761, 108, 818, 6927, 691, 496, 13501, 236764, 214709, 89830, 236761]
Cache nKeep=24
seq (34) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761, 108, 818, 6927, 691, 496, 13501, 236764, 214709, 89830, 236761, 1030, 236789, 236751, 496, 3925, 1003, 496, 3184, 3875, 7489]
Cache nKeep=0
Simple repro (see console): https://svelte-local-ai.khromov.se/debug
Code: https://github.com/khromov/sveltekit-local-ai/blob/main/src/routes/debug/%2Bpage.svelte
This patch fixes the issue: https://github.com/ngxson/wllama/compare/master...khromov:wllama:cache-reset-fix-2?expand=1
Thanks, I'll cherry-pick your fix