pyopencl
pyopencl copied to clipboard
invoker lock-file conflict on nfs cluster
When running pyopencl on a cluster with an nfs filesystem, a lock file created in my home dir on one node prevents the other nodes from progressing. I've pasted a stack trace below.
At first I thought I could fix the problem by supplying the "cache_dir" argument when creating the pyopencl context, to point to somewhere in /tmp
which isn't in the nfs. However, those lock files aren't the problem: The problem is the use of PersistentDict to define the "invoker_cache" in invoker.py using the default lock file location, which is inside my home dir on the nfs, in my case.
As a workaround, I've modified invoker.py
on my system so the definition
reads
invoker_cache = PersistentDict("pyopencl-invoker-cache-v1",
key_builder=NumpyTypesKeyBuilder(),
container_dir='/tmp/cl/invoker')
Perhaps in future versions of pyopencl you could make the container_dir configurable?
stack-trace:
File
"/usr/home/p/605/tuf33565/anaconda2/lib/python2.7/site-packages/pyopencl/__init__.py",
line 320, in __getattr__
knl = Kernel(self, attr)
File
"/usr/home/p/605/tuf33565/anaconda2/lib/python2.7/site-packages/pyopencl/cffi_cl.py",
line 1690, in __init__
self._setup(program)
File
"/usr/home/p/605/tuf33565/anaconda2/lib/python2.7/site-packages/pyopencl/cffi_cl.py",
line 1700, in _setup
work_around_arg_count_bug=None)
File
"/usr/home/p/605/tuf33565/anaconda2/lib/python2.7/site-packages/pyopencl/invoker.py",
line 388, in generate_enqueue_and_set_args
result = invoker_cache[cache_key]
File
"/usr/home/p/605/tuf33565/.local/lib/python2.7/site-packages/pytools/persistent_dict.py",
line 472, in __getitem__
return self.fetch(key)
File
"/usr/home/p/605/tuf33565/.local/lib/python2.7/site-packages/pytools/persistent_dict.py",
line 700, in fetch
LockManager(cleanup_m, self._lock_file(hexdigest_key))
File
"/usr/home/p/605/tuf33565/.local/lib/python2.7/site-packages/pytools/persistent_dict.py",
line 128, in __init__
"--something is wrong" % self.lock_file)
RuntimeError: waited more than three minutes on the lock file
'/usr/home/p/605/tuf33565/.cache/pytools/pdict-v2-pyopencl-invoker-cache-v1-py2.7.13.final.0/75d86f4c7e7bed5781efc15198f91210c98d69a44f2a8fa928503c1cf560d256.lock'--something
is wrong
Thanks for the report! I'm currently chasing a deadline (Sunday)--I'll worry about this next week, likely by deriving all cache dirs (binary and invoker) from the one passed to the context. I'd also be very open to receiving a patch. :)
No hurry at all - I've fixed it on my system so I'm happy, just wanted to let you know about the idea.
I'm also pretty busy but a patch may be incoming some day :)