threadpoolctl
threadpoolctl copied to clipboard
MemoryError after exceeding OpenMP/OpenBLAS thread limit
Hello. I use threadpoolctl 2.2.0 which runs very well most of the time. However, after exceeding the OpenMP or OpenBLAS thread limit, threadpoolctl seems to have broken down. It does not recover even after the thread-limit-exceeding processes have been killed, or quite some time after that. The full error message of a simple example is shown below. Is there any way to reset threadpoolctl so it continues to function without having to reboot the computer?
Python 3.9.5 (default, Jun 4 2021, 12:28:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.24.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from threadpoolctl import threadpool_limits
...: with threadpool_limits(limits=1):
...: a=1
...:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-1-2121fc2c928d> in <module>
1 from threadpoolctl import threadpool_limits
----> 2 with threadpool_limits(limits=1):
3 a=1
4
~/.local/lib/python3.9/site-packages/threadpoolctl.py in __init__(self, limits, user_api)
169 self._check_params(limits, user_api)
170
--> 171 self._original_info = self._set_threadpool_limits()
172
173 def __enter__(self):
~/.local/lib/python3.9/site-packages/threadpoolctl.py in _set_threadpool_limits(self)
266 return None
267
--> 268 modules = _ThreadpoolInfo(prefixes=self._prefixes,
269 user_api=self._user_api)
270 for module in modules:
~/.local/lib/python3.9/site-packages/threadpoolctl.py in __init__(self, user_api, prefixes, modules)
338
339 self.modules = []
--> 340 self._load_modules()
341 self._warn_if_incompatible_openmp()
342 else:
~/.local/lib/python3.9/site-packages/threadpoolctl.py in _load_modules(self)
373 self._find_modules_with_enum_process_module_ex()
374 else:
--> 375 self._find_modules_with_dl_iterate_phdr()
376
377 def _find_modules_with_dl_iterate_phdr(self):
~/.local/lib/python3.9/site-packages/threadpoolctl.py in _find_modules_with_dl_iterate_phdr(self)
404 ctypes.c_int, # Return type
405 ctypes.POINTER(_dl_phdr_info), ctypes.c_size_t, ctypes.c_char_p)
--> 406 c_match_module_callback = c_func_signature(match_module_callback)
407
408 data = ctypes.c_char_p(b"")
MemoryError:
Hi @lingfeiwang, I'm not sure that I understand how you triggered that. Could you detail a bit more the steps that lead to this broken state ?
Actually I completely did not expect it to happen and therefore did not record the process to reproduce the error, or the error log itself from OpenMP or OpenBLAS. Briefly, I ran some computation in too many parallel processes where each used OpenMP or OpenBLAS possibly through numpy/scipy, so together it exceeded a certain limit, maybe set by the kernel, and reported the related error lines. I then killed such processes and everything seemed to have recovered, except threadpoolctl which I later discovered.
I understand this is super uninformative but trying to reproduce it on a shared computing server would be damaging. I don't know how rare this error appears, but I guess computing servers are constantly tortured on the planet. For me, reboot solved the issue, but someone else might follow up on this thread with more details another day.
Thanks for the feedback. It might indeed be a bug of the linux kernel or the openmp runtime relying on an incorrectly updated stateful attribute of the system. If that ever happens it would be interesting to start a post-mortem pdb session to introspect the values of the match_module_callback
signature. I do not understand how a MemoryError
can possibly be raised on this line...