cachier icon indicating copy to clipboard operation
cachier copied to clipboard

Multiple concurrent writers (and readers) with shared NFS mount

Open matthewcummings opened this issue 3 years ago • 6 comments

I'm seeing an issue with cachier where I'm getting Bad File Descriptor errors. Just want to make sure I'm not completely abusing it here, I have multiple writers (and readers) accessing a shared cache directory via NFS.

Am I way off the mark here for thinking this would ever work correctly?

matthewcummings avatar Dec 05 '21 20:12 matthewcummings

Hmmm. Honestly no idea.

Isn't NFS should technically make the software layer "feel" as if it's just accessing the file system? I really lack the deep OS/kernel/filesystem knowledge required to answer this.

Each cachier wrapper gets its own cache file (when using the pickle core) and just acquires and releases a lock to read/write to the file. Specifically, since by cachier also holds the cache in-memory, you can have it not reload the cache from file on every call (with @cachier(pickle_reload=False)), which doesn't effect relevancy of results if you run single threaded, but can mean different threads might get stale results if you run multi-threaded.

Also, the separate_files=True option makes the cache use a separate file for each argument set, so each function uses numerous files. Better for larger results, and might help in your case, if indeed different readers and writers use different argument sets to the same function. Might.

But then, Bad File Descriptor doesn't sound like it has to do with locking. On the other hand, I don't know. :)

Do you get it sporadically, or consistently?

shaypal5 avatar Dec 06 '21 09:12 shaypal5

Consistently, it could be a bug in my code too. Mostly I wanted to confirm that this should work. Thank you

matthewcummings avatar Dec 07 '21 16:12 matthewcummings

Cool. Please let me know how it goes. It will help me help you if I understand the use case better; e.g. do all readers and writers use the same function or not, etc.

shaypal5 avatar Dec 08 '21 11:12 shaypal5

Will do, thank you.

matthewcummings avatar Dec 08 '21 18:12 matthewcummings

Any news on that? @matthewcummings

shaypal5 avatar Jan 09 '22 10:01 shaypal5

Note to all interested parties: There is a possibly related issue here: https://github.com/python-cachier/cachier/issues/128

And if portalocker is indeed the culprit then this might be the relevant issue on their repository: https://github.com/wolph/portalocker/issues/92

shaypal5 avatar Dec 05 '23 19:12 shaypal5