RuntimeError: cannot cache function 'square': no locator available for file
When trying to import a numba-cached function from a python egg package, I get the following error message about no cache file locator being found:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/home/username/.local/lib/python3.6/site-packages/mytestpackage-1.2.3-py3.6.egg/mytestpackage/mymodule.py", line 3, in <module>
File "/home/username/.local/lib/python3.6/site-packages/numba/decorators.py", line 194, in wrapper
disp.enable_caching()
File "/home/username/.local/lib/python3.6/site-packages/numba/dispatcher.py", line 679, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/home/username/.local/lib/python3.6/site-packages/numba/caching.py", line 614, in __init__
self._impl = self._impl_class(py_func)
File "/home/username/.local/lib/python3.6/site-packages/numba/caching.py", line 349, in __init__
"for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'square': no locator available for file '/home/username/.local/lib/python3.6/site-packages/mytestpackage-1.2.3-py3.6.egg/mytestpackage/mymodule.py'
The bug occurs at least with numba versions 0.45.1 and 0.46.1.
The bug appears to be operating system-agnostic.
Exporting a user cache directory does not help: export NUMBA_CACHE_DIR=/tmp.
I believe that the various caching file locators fail because they assume that the python files are located in plain directories, when they can in fact be in an egg, for example here:
def from_function(cls, py_func, py_file):
if not (os.path.exists(py_file) or getattr(sys, 'frozen', False)):
# Perhaps a placeholder (e.g. "<ipython-XXX>")
# stop function exit if frozen, since it uses a temp placeholder
return
self = cls(py_func, py_file)
try:
self.ensure_cache_path()
except OSError:
# Cannot ensure the cache directory exists or is writable
return
return self
To produce a minimal package and reproduce this bug, run this code in a shell:
echo "UEsDBBQDAAAAADd9fU8AAAAAAAAAAAAAAAAOAAAAbXl0ZXN0cGFja2FnZS9QSwMEFAMAAAAAvXx9TwAAAAAAAAAAAAAAABwAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvUEsDBAoDAAAAAMVqfU9D34g0AgAAAAIAAAAnAAAAbXl0ZXN0cGFja2FnZS9teXRlc3RwYWNrYWdlL19faW5pdF9fLnB5IApQSwMEFAMAAAgAuXx9TxKRbY9SAAAAUwAAACcAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvbXltb2R1bGUucHkdxbEKgCAQAND9vuJo0ohmEYI+oi0azE4y0OrywM8vesuL6Tq5YJa0OoDxv89HLKoJRgUz20U3HXrndxomFtKwUcDnFsekqraAH6YinLFiixVeUEsDBBQDAAAIALB8fU9A2V6/YwAAAI0AAAAWAAAAbXl0ZXN0cGFja2FnZS9zZXR1cC5weVWMSwqFMAxF51lF6EihCO857lqkaJSibUoTBXev+Bl4h+dcToiZiyILjIUjCumalXkRvM1NLI4hDV32/ewnEoCLVoDnko/kTNyVRJ+DsZfZqEjg5Myv+TftA9+G+xSr2kINB1BLAQI/AxQDAAAAADd9fU8AAAAAAAAAAAAAAAAOACQAAAAAAAAAEIDtQQAAAABteXRlc3RwYWNrYWdlLwoAIAAAAAAAAQAYAICaOR/DptUBgPSbIcOm1QGAmjkfw6bVAVBLAQI/AxQDAAAAAL18fU8AAAAAAAAAAAAAAAAcACQAAAAAAAAAEIDtQSwAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvCgAgAAAAAAABABgAgJBTl8Km1QEAizQiw6bVAYCQU5fCptUBUEsBAj8DCgMAAAAAxWp9T0PfiDQCAAAAAgAAACcAJAAAAAAAAAAggKSBZgAAAG15dGVzdHBhY2thZ2UvbXl0ZXN0cGFja2FnZS9fX2luaXRfXy5weQoAIAAAAAAAAQAYAIBuvZ6vptUBAIs0IsOm1QGAbr2er6bVAVBLAQI/AxQDAAAIALl8fU8SkW2PUgAAAFMAAAAnACQAAAAAAAAAIICkga0AAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvbXltb2R1bGUucHkKACAAAAAAAAEAGACA3I6SwqbVAQCLNCLDptUBgNyOksKm1QFQSwECPwMUAwAACACwfH1PQNlev2MAAACNAAAAFgAkAAAAAAAAACCApIFEAQAAbXl0ZXN0cGFja2FnZS9zZXR1cC5weQoAIAAAAAAAAQAYAADebIjCptUBgJVG28Km1QEA3myIwqbVAVBLBQYAAAAABQAFACgCAADbAQAAAAA=" | base64 -d > mytestpackage.zip
unzip mytestpackage.zip
cd mytestpackage
python setup.py install --user
cd /tmp
python -c "import mytestpackage.mymodule"
Or alternatively, if you do not trust this base64-encoded zip-file, you can create a package manually. The directory structure is:
mytestpackage
├── mytestpackage
│ ├── __init__.py
│ └── mymodule.py
└── setup.py
The content of the file mymodule.py is:
import numba
@numba.njit("f8(f8)", cache=True)
def square(x):
return x * x
The content of the file setup.py is:
from setuptools import setup, find_packages
setup(
name="mytestpackage",
version="1.2.3",
packages=find_packages(),
)
The file __init__.py is empty.
Often it might be preferable to compile the functions ahead of time during setup with numba.pycc if the required features are available (e.g. no parallel=True (yet)).
Another workaround is to install the package with pip install ., which will not create an egg, instead of python setup.py install.
Another workaround is to add zip_safe=False to the setup.py file of the packages that use numba caching.
Indeed, the issue seems to be this line: https://github.com/numba/numba/blob/0db8a2bcd0f53c0d0ad8a798432fb3f37f14af27/numba/core/caching.py#L188
os.path.exists returns False for a path inside an egg/zip.
My use case is including an egg with a PySpark job, which will be copied to executors running my numba code. I don't have the option to pip install my package on these executor nodes.
I was considering monkey patching numba and inserting my own custom locator here: https://github.com/numba/numba/blob/0db8a2bcd0f53c0d0ad8a798432fb3f37f14af27/numba/core/caching.py#L328
Ultimately I decided on some magic inside my own package, where I infer if my package was imported as a zip/egg, and if so I unzip the package, modify sys.path, and reload my package.
For most users working with docker images setting ENV NUMBA_CACHE_DIR=/tmp should help(even it didn't work for the OP)
Just to confirm, because this never got a response from a maintainer before, this is still an issue with numba main. Following the OP's steps above (which were clear and concise, many thanks @99991), I get:
$ python -c "import mytestpackage.mymodule"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/site-packages/mytestpackage-1.2.3-py3.10.egg/mytestpackage/mymodule.py", line 4, in <module>
File "/home/gmarkall/numbadev/numba/numba/core/decorators.py", line 212, in wrapper
disp.enable_caching()
File "/home/gmarkall/numbadev/numba/numba/core/dispatcher.py", line 863, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/home/gmarkall/numbadev/numba/numba/core/caching.py", line 601, in __init__
self._impl = self._impl_class(py_func)
File "/home/gmarkall/numbadev/numba/numba/core/caching.py", line 337, in __init__
raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'square': no locator available for file '/home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/site-packages/mytestpackage-1.2.3-py3.10.egg/mytestpackage/mymodule.py'
I'm going to add this to the 0.57RC milestone as I think it's very tight to get anything into 0.56, but I do think it's important to look at resolving this.
Hi @gmarkall, when is the numba 0.57 release expected to come out ? I'm facing the same issue in one of my projects and wanted to plan my work around the release timeline.
Also, please let me know if there is a workaround/alternate solution to this issue. Many thanks!
I'd guess we're at least 6 months out from 0.57.
Workarounds from this issue seem to be:
- Add
zip_safe=Falseto thesetup.pyof the packages using Numba caching: https://github.com/numba/numba/issues/4908#issuecomment-571537473 - Monkey patching Numba to insert a custom locator: https://github.com/numba/numba/issues/4908#issuecomment-719063147
- Detect in the package if it's imported as a zip, extract it, then modify
sys.path: https://github.com/numba/numba/issues/4908#issuecomment-719063147
If you go with the monkey patch route, I would imagine the locator class that gets monkey patched into the list of cache locators would probably be suitable to add to Numba permanently as a PR - if you go with this, please do let me know and we can work towards a PR.
Seems numba caching issue is happening when using package timezonefinder under pyspark. python: 3.6 spark: 2.4.8 timezonefinder: 5.2.0
Issue reference: https://github.com/jannikmi/timezonefinder/issues/206#issue-1849221394
I am having this problem when this Docker container (https://github.com/BioContainers/containers/edit/master/cellpose/2.2.2/Dockerfile) that uses ENV NUMBA_CACHE_DIR=/tmp is converted to a singularity container. It gives the following error message in a Github CI when trying to run code within that singularity container. Strangely, this does not happen, when I run the code in singularity within a Gitpod code environment...
Does anyone have any suggestion how to fix the Docker container to also work in hosted singularity tests?
I am having this problem when this Docker container (https://github.com/BioContainers/containers/edit/master/cellpose/2.2.2/Dockerfile) that uses
ENV NUMBA_CACHE_DIR=/tmpis converted to a singularity container. It gives the following error message in a Github CI when trying to run code within that singularity container. Strangely, this does not happen, when I run the code in singularity within a Gitpod code environment...
For people coming from Google search:
- You can add
SINGULARITYENV_NUMBA_CACHE_DIR="$TMPDIR"to your singularity call - On Nextflow you can also use
singularity.runOptions = '--no-mount tmp --writable-tmpfs'