lmql Provide an option to configure LMQL's tokenizer caching behaviour.

Provide an option to configure LMQL's tokenizer caching behaviour.

Open vivien000 opened this issue 1 year ago • 2 comments

I would like to disable caching when loading a tokenizer (because I'm working in a context in which I can't write on the file system).

I see that there is a "NO_CACHE" environment variable but it seems to be only used in cache_file_exists in caching.py (line 40). Whatever the value for this environment variable, the tokenizer is cached in tokenizer.py (lines 322-323 and 348-349). Would it make sense and would it be possible to test whether the "NO_CACHE" environment variable is set before caching the tokenizer?

PS: alternatively, in my context, I could also circumvent the problem above if the cache directory can be an absolute path and if it could be set as an environment variable.

Oct 09 '23 17:10 vivien000

Hi Vivien. Your request sounds reasonable, I can see how the current setup may cause issues in some environments.

NO_CACHE is currently only meant to force the runtime to not use the existing cache, not to prevent it from writing out a new one.

One alternative that should work is to instead set HOME, which we use as a base for deriving the ~/.cache/lmql directory.

Beyond this, I don't think it's a big change to support more options here, e.g. I can see multiple things making sense here:

Prevent all forms of caching
Use an in-memory cache, e.g. for long-running processes that can re-use caches across multiple queries
Allow to change the LMQL cache folder specifically (not just the entire HOME)

Let me know if setting HOME works for you and what you think of these potential improvements.

Oct 09 '23 17:10 lbeurerkellner

Hi Luca. Many thanks for your reply. It works for me by just modifying HOME as you proposed. The options you suggest seem very reasonable but I don't have a clear enough view of the context of LMQL users to have a relevant opinion on them.

I don't need anything else but I leave the issue open in case you want to use it for tracking the enhancements you suggested.

Oct 10 '23 10:10 vivien000

lmql lmql copied to clipboard

Provide an option to configure LMQL's tokenizer caching behaviour.

lmql
lmql copied to clipboard