numba icon indicating copy to clipboard operation
numba copied to clipboard

RuntimeError: cannot cache function 'square': no locator available for file

Open 99991 opened this issue 6 years ago • 9 comments

When trying to import a numba-cached function from a python egg package, I get the following error message about no cache file locator being found:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
  File "/home/username/.local/lib/python3.6/site-packages/mytestpackage-1.2.3-py3.6.egg/mytestpackage/mymodule.py", line 3, in <module>
  File "/home/username/.local/lib/python3.6/site-packages/numba/decorators.py", line 194, in wrapper
    disp.enable_caching()
  File "/home/username/.local/lib/python3.6/site-packages/numba/dispatcher.py", line 679, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/home/username/.local/lib/python3.6/site-packages/numba/caching.py", line 614, in __init__
    self._impl = self._impl_class(py_func)
  File "/home/username/.local/lib/python3.6/site-packages/numba/caching.py", line 349, in __init__
    "for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'square': no locator available for file '/home/username/.local/lib/python3.6/site-packages/mytestpackage-1.2.3-py3.6.egg/mytestpackage/mymodule.py'

The bug occurs at least with numba versions 0.45.1 and 0.46.1. The bug appears to be operating system-agnostic. Exporting a user cache directory does not help: export NUMBA_CACHE_DIR=/tmp. I believe that the various caching file locators fail because they assume that the python files are located in plain directories, when they can in fact be in an egg, for example here:

def from_function(cls, py_func, py_file):
    if not (os.path.exists(py_file) or getattr(sys, 'frozen', False)):
        # Perhaps a placeholder (e.g. "<ipython-XXX>")
        # stop function exit if frozen, since it uses a temp placeholder
        return
    self = cls(py_func, py_file)
    try:
        self.ensure_cache_path()
    except OSError:
        # Cannot ensure the cache directory exists or is writable
        return
    return self

To produce a minimal package and reproduce this bug, run this code in a shell:

echo "UEsDBBQDAAAAADd9fU8AAAAAAAAAAAAAAAAOAAAAbXl0ZXN0cGFja2FnZS9QSwMEFAMAAAAAvXx9TwAAAAAAAAAAAAAAABwAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvUEsDBAoDAAAAAMVqfU9D34g0AgAAAAIAAAAnAAAAbXl0ZXN0cGFja2FnZS9teXRlc3RwYWNrYWdlL19faW5pdF9fLnB5IApQSwMEFAMAAAgAuXx9TxKRbY9SAAAAUwAAACcAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvbXltb2R1bGUucHkdxbEKgCAQAND9vuJo0ohmEYI+oi0azE4y0OrywM8vesuL6Tq5YJa0OoDxv89HLKoJRgUz20U3HXrndxomFtKwUcDnFsekqraAH6YinLFiixVeUEsDBBQDAAAIALB8fU9A2V6/YwAAAI0AAAAWAAAAbXl0ZXN0cGFja2FnZS9zZXR1cC5weVWMSwqFMAxF51lF6EihCO857lqkaJSibUoTBXev+Bl4h+dcToiZiyILjIUjCumalXkRvM1NLI4hDV32/ewnEoCLVoDnko/kTNyVRJ+DsZfZqEjg5Myv+TftA9+G+xSr2kINB1BLAQI/AxQDAAAAADd9fU8AAAAAAAAAAAAAAAAOACQAAAAAAAAAEIDtQQAAAABteXRlc3RwYWNrYWdlLwoAIAAAAAAAAQAYAICaOR/DptUBgPSbIcOm1QGAmjkfw6bVAVBLAQI/AxQDAAAAAL18fU8AAAAAAAAAAAAAAAAcACQAAAAAAAAAEIDtQSwAAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvCgAgAAAAAAABABgAgJBTl8Km1QEAizQiw6bVAYCQU5fCptUBUEsBAj8DCgMAAAAAxWp9T0PfiDQCAAAAAgAAACcAJAAAAAAAAAAggKSBZgAAAG15dGVzdHBhY2thZ2UvbXl0ZXN0cGFja2FnZS9fX2luaXRfXy5weQoAIAAAAAAAAQAYAIBuvZ6vptUBAIs0IsOm1QGAbr2er6bVAVBLAQI/AxQDAAAIALl8fU8SkW2PUgAAAFMAAAAnACQAAAAAAAAAIICkga0AAABteXRlc3RwYWNrYWdlL215dGVzdHBhY2thZ2UvbXltb2R1bGUucHkKACAAAAAAAAEAGACA3I6SwqbVAQCLNCLDptUBgNyOksKm1QFQSwECPwMUAwAACACwfH1PQNlev2MAAACNAAAAFgAkAAAAAAAAACCApIFEAQAAbXl0ZXN0cGFja2FnZS9zZXR1cC5weQoAIAAAAAAAAQAYAADebIjCptUBgJVG28Km1QEA3myIwqbVAVBLBQYAAAAABQAFACgCAADbAQAAAAA=" | base64 -d > mytestpackage.zip
unzip mytestpackage.zip
cd mytestpackage
python setup.py install --user
cd /tmp
python -c "import mytestpackage.mymodule"

Or alternatively, if you do not trust this base64-encoded zip-file, you can create a package manually. The directory structure is:

mytestpackage
├── mytestpackage
│   ├── __init__.py
│   └── mymodule.py
└── setup.py

The content of the file mymodule.py is:

import numba

@numba.njit("f8(f8)", cache=True)
def square(x):
    return x * x

The content of the file setup.py is:

from setuptools import setup, find_packages

setup(
    name="mytestpackage",
    version="1.2.3",
    packages=find_packages(),
)

The file __init__.py is empty.

Often it might be preferable to compile the functions ahead of time during setup with numba.pycc if the required features are available (e.g. no parallel=True (yet)).

Another workaround is to install the package with pip install ., which will not create an egg, instead of python setup.py install.

99991 avatar Nov 29 '19 15:11 99991

Another workaround is to add zip_safe=False to the setup.py file of the packages that use numba caching.

candalfigomoro avatar Jan 07 '20 10:01 candalfigomoro

Indeed, the issue seems to be this line: https://github.com/numba/numba/blob/0db8a2bcd0f53c0d0ad8a798432fb3f37f14af27/numba/core/caching.py#L188

os.path.exists returns False for a path inside an egg/zip.

My use case is including an egg with a PySpark job, which will be copied to executors running my numba code. I don't have the option to pip install my package on these executor nodes.

I was considering monkey patching numba and inserting my own custom locator here: https://github.com/numba/numba/blob/0db8a2bcd0f53c0d0ad8a798432fb3f37f14af27/numba/core/caching.py#L328

Ultimately I decided on some magic inside my own package, where I infer if my package was imported as a zip/egg, and if so I unzip the package, modify sys.path, and reload my package.

trianta2 avatar Oct 29 '20 22:10 trianta2

For most users working with docker images setting ENV NUMBA_CACHE_DIR=/tmp should help(even it didn't work for the OP)

abin-tiger avatar May 25 '22 09:05 abin-tiger

Just to confirm, because this never got a response from a maintainer before, this is still an issue with numba main. Following the OP's steps above (which were clear and concise, many thanks @99991), I get:

$ python -c "import mytestpackage.mymodule"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/site-packages/mytestpackage-1.2.3-py3.10.egg/mytestpackage/mymodule.py", line 4, in <module>
  File "/home/gmarkall/numbadev/numba/numba/core/decorators.py", line 212, in wrapper
    disp.enable_caching()
  File "/home/gmarkall/numbadev/numba/numba/core/dispatcher.py", line 863, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/home/gmarkall/numbadev/numba/numba/core/caching.py", line 601, in __init__
    self._impl = self._impl_class(py_func)
  File "/home/gmarkall/numbadev/numba/numba/core/caching.py", line 337, in __init__
    raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'square': no locator available for file '/home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/site-packages/mytestpackage-1.2.3-py3.10.egg/mytestpackage/mymodule.py'

I'm going to add this to the 0.57RC milestone as I think it's very tight to get anything into 0.56, but I do think it's important to look at resolving this.

gmarkall avatar May 25 '22 12:05 gmarkall

Hi @gmarkall, when is the numba 0.57 release expected to come out ? I'm facing the same issue in one of my projects and wanted to plan my work around the release timeline.

Also, please let me know if there is a workaround/alternate solution to this issue. Many thanks!

hancelpv avatar Jun 21 '22 05:06 hancelpv

I'd guess we're at least 6 months out from 0.57.

Workarounds from this issue seem to be:

  • Add zip_safe=False to the setup.py of the packages using Numba caching: https://github.com/numba/numba/issues/4908#issuecomment-571537473
  • Monkey patching Numba to insert a custom locator: https://github.com/numba/numba/issues/4908#issuecomment-719063147
  • Detect in the package if it's imported as a zip, extract it, then modify sys.path: https://github.com/numba/numba/issues/4908#issuecomment-719063147

If you go with the monkey patch route, I would imagine the locator class that gets monkey patched into the list of cache locators would probably be suitable to add to Numba permanently as a PR - if you go with this, please do let me know and we can work towards a PR.

gmarkall avatar Jun 21 '22 09:06 gmarkall

Seems numba caching issue is happening when using package timezonefinder under pyspark. python: 3.6 spark: 2.4.8 timezonefinder: 5.2.0

Issue reference: https://github.com/jannikmi/timezonefinder/issues/206#issue-1849221394

sureskn3 avatar Aug 17 '23 11:08 sureskn3

I am having this problem when this Docker container (https://github.com/BioContainers/containers/edit/master/cellpose/2.2.2/Dockerfile) that uses ENV NUMBA_CACHE_DIR=/tmp is converted to a singularity container. It gives the following error message in a Github CI when trying to run code within that singularity container. Strangely, this does not happen, when I run the code in singularity within a Gitpod code environment...

Does anyone have any suggestion how to fix the Docker container to also work in hosted singularity tests?

FloWuenne avatar Nov 13 '23 15:11 FloWuenne

I am having this problem when this Docker container (https://github.com/BioContainers/containers/edit/master/cellpose/2.2.2/Dockerfile) that uses ENV NUMBA_CACHE_DIR=/tmp is converted to a singularity container. It gives the following error message in a Github CI when trying to run code within that singularity container. Strangely, this does not happen, when I run the code in singularity within a Gitpod code environment...

For people coming from Google search:

  • You can add SINGULARITYENV_NUMBA_CACHE_DIR="$TMPDIR" to your singularity call
  • On Nextflow you can also use singularity.runOptions = '--no-mount tmp --writable-tmpfs'

tbrittoborges avatar Sep 13 '24 10:09 tbrittoborges