wllama icon indicating copy to clipboard operation
wllama copied to clipboard

`useCache` on completion not working correctly

Open khromov opened this issue 7 months ago • 2 comments

In the main chat example, the cache does not appear to work properly. Every second message the cache gets reset. (nKeep=0)

Image

I have tried making a very simple reproduction of sending multiple calls to createCompletion and also logging the token sequence in computeNonCachedTokens and the issue persists even though the tokens are following each other correctly and should be cached.

Example log:

seq (14) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761]
Cache nKeep=0

seq (24) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761, 108, 818, 6927, 691, 496, 13501, 236764, 214709, 89830, 236761]
Cache nKeep=24

seq (34) [2, 54593, 786, 496, 2822, 3925, 236761, 5185, 236789, 236745, 1161, 1757, 19005, 236761, 108, 818, 6927, 691, 496, 13501, 236764, 214709, 89830, 236761, 1030, 236789, 236751, 496, 3925, 1003, 496, 3184, 3875, 7489]
Cache nKeep=0

Simple repro (see console): https://svelte-local-ai.khromov.se/debug

Code: https://github.com/khromov/sveltekit-local-ai/blob/main/src/routes/debug/%2Bpage.svelte

khromov avatar May 16 '25 23:05 khromov

This patch fixes the issue: https://github.com/ngxson/wllama/compare/master...khromov:wllama:cache-reset-fix-2?expand=1

khromov avatar Aug 16 '25 21:08 khromov

Thanks, I'll cherry-pick your fix

ngxson avatar Aug 17 '25 11:08 ngxson