CTranslate2
CTranslate2 copied to clipboard
Any way to manually clear the cache for static prompt for generator.generate_tokens?
I am wondering if there exists any way to manually clear the cache for static prompt for generator.generate_tokens. We are running an algorithm where a lot of computations can be saved by cache, however in our setting, we have several static prompts instead of one. By setting all of them to static prompt, we observe the gpu memory keeps going up. So can we manually clear the cache for static prompt here?