lmql
lmql copied to clipboard
Provide an option to configure LMQL's tokenizer caching behaviour.
I would like to disable caching when loading a tokenizer (because I'm working in a context in which I can't write on the file system).
I see that there is a "NO_CACHE" environment variable but it seems to be only used in cache_file_exists
in caching.py (line 40). Whatever the value for this environment variable, the tokenizer is cached in tokenizer.py (lines 322-323 and 348-349). Would it make sense and would it be possible to test whether the "NO_CACHE" environment variable is set before caching the tokenizer?
PS: alternatively, in my context, I could also circumvent the problem above if the cache directory can be an absolute path and if it could be set as an environment variable.
Hi Vivien. Your request sounds reasonable, I can see how the current setup may cause issues in some environments.
NO_CACHE
is currently only meant to force the runtime to not use the existing cache, not to prevent it from writing out a new one.
One alternative that should work is to instead set HOME
, which we use as a base for deriving the ~/.cache/lmql
directory.
Beyond this, I don't think it's a big change to support more options here, e.g. I can see multiple things making sense here:
- Prevent all forms of caching
- Use an in-memory cache, e.g. for long-running processes that can re-use caches across multiple queries
- Allow to change the LMQL cache folder specifically (not just the entire HOME)
Let me know if setting HOME works for you and what you think of these potential improvements.
Hi Luca. Many thanks for your reply. It works for me by just modifying HOME
as you proposed. The options you suggest seem very reasonable but I don't have a clear enough view of the context of LMQL users to have a relevant opinion on them.
I don't need anything else but I leave the issue open in case you want to use it for tracking the enhancements you suggested.