PyOxidizer
PyOxidizer copied to clipboard
pkgutil.get_data() with in-memory resources
Third-party libraries like jsonschema, that call pkgutil.get_data() to read their bundled resources, currently fail when the sources are in memory, as they lack a __file__ attribute. PyOxidizer's ResourceReader interface does not require __file__, and can successfully retrieve the resources. By dynamically replacing get_data() with a call to open_resource(), it appears to be possible to get jsonschema to work when in memory (mostly - it still has a dependency on an extension module):
def make_exe():
dist = default_python_distribution()
policy = dist.make_python_packaging_policy()
policy.resources_location = "in-memory"
policy.resources_location_fallback = "filesystem-relative:lib"
python_config = dist.make_python_interpreter_config()
python_config.run_command = """
import sys
import pkgutil
import importlib
def get_data_custom(package, resource):
try:
module = importlib.import_module(package)
with module.__loader__.get_resource_reader(package).open_resource(resource) as f:
return f.read()
except:
return None
if getattr(sys, "oxidized", False):
pkgutil.get_data = get_data_custom
# quick jsonschema test
schema = {"type" : "object", "properties" : {"price" : {"type" : "number"}} }
import jsonschema
jsonschema.validate(instance=dict(price="test"), schema=schema)
"""
exe = dist.to_python_executable(
name = "pyapp",
packaging_policy = policy,
config = python_config,
)
for resource in exe.pip_install(["jsonschema"]):
exe.add_python_resource(resource)
return exe
def make_embedded_resources(exe):
return exe.to_embedded_resources()
def make_install(exe):
files = FileManifest()
files.add_python_resource(".", exe)
return files
register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends = ["exe"], default_build_script = True)
register_target("install", make_install, depends = ["exe"], default = True)
resolve_targets()
The sys.oxidized gate is necessary to ensure things run correctly outside PyOxidizer, as the default Python ResourceReader can not handle subdirs in the resource path.
I have not tested this with any other libraries yet, but presume this approach is not limited to jsonschema. It's a hack, and I'm not suggesting this be included in PyOxidizer itself, but I thought it might be worth mentioning in case anyone finds it useful.
Partially related to #436
Here's the implementation of pkgutil.read_data() from the Python stdlib:
def get_data(package, resource):
spec = importlib.util.find_spec(package)
if spec is None:
return None
loader = spec.loader
if loader is None or not hasattr(loader, 'get_data'):
return None
# XXX needs test
mod = (sys.modules.get(package) or
importlib._bootstrap._load(spec))
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
return loader.get_data(resource_name)
That if mod is None or not hasattr(mode, '__file__') check is preventing pkgutil.get_data() from working with PyOxidizer, even though OxidizedFinder implements get_data(). Sadness.
I'm really tempted to give up on __file__ being optional and set it to a virtual path under the current executable's path. This is what zipimport does and as long as a sys.path_hooks entry is registered, things should just work.
FWIW jsonschema is now using importlib.resources (https://github.com/Julian/jsonschema/pull/873/files) - v4.2.0.
Hmm, the new code doesn't seem to work for me on a pyoxidized Python 3.9:
File "jsonschema", line 29, in <module>
File "jsonschema.validators", line 349, in <module>
File "jsonschema._utils", line 60, in load_schema
File "importlib.resources", line 147, in files
File "importlib._common", line 14, in from_package
File "importlib._common", line 18, in fallback_resources
File "pathlib", line 1082, in __new__
File "pathlib", line 707, in _from_parts
File "pathlib", line 691, in _parse_args
TypeError: expected str, bytes or os.PathLike object, not NoneType
You're right @dae, I can reproduce when the location is in-memory.
It does work with jsonschema v4.2.0 when using filesystem-relative.
The new code breaks due to https://github.com/indygreg/PyOxidizer/issues/237. It's doing exactly that:
It is possible some code somewhere (including in the Python standard library) is buggy and assumes presence of a 3.9 API (e.g. importlib.resources.files() implies support for these APIs on individual meta path importers. I could easily see how someone could test for hasattr(importlib.resources, "files") and assume all meta path importers support .files(). This behavior would be naive about the possibility of 3rd party meta path importers, such as OxidizedFinder.