pyinstaller-hooks-contrib icon indicating copy to clipboard operation
pyinstaller-hooks-contrib copied to clipboard

How to handle wheels repaired by `delvewheel`

Open adang1345 opened this issue 2 years ago • 2 comments

Hi, I'm the maintainer of https://github.com/adang1345/delvewheel, which is a tool that bundles external DLLs into wheels. I have seen reports that whenever pyinstaller is used to create an executable that depends on a wheel that was created by delvewheel, the executable does not run properly because DLLs are no longer present in the expected location.

For context, see https://github.com/adang1345/delvewheel/issues/18, which contains some details about what the problem is. https://github.com/edenhill/librdkafka/issues/3484 may also provide some information.

I also noticed https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/388, which is an issue that was raised because the Shapely library began to use delvewheel. That problem was fixed by updating the Shapely hook for pyinstaller.

I'm wondering whether there's a better way to have pyinstaller work with wheels that were created by delvewheel other than having a pyinstaller hook for every library that decides to use delvewheel. Any ideas are welcome.

adang1345 avatar Apr 28 '22 20:04 adang1345

With the current PyInstaller design, I think having hooks for packages is somewhat unavoidable, even if we ignore delvewheel for a bit. That's because PyInstaller does not automatically collect arbitrary data files (e.g., DLL load order list) from a package, nor does it collect bundled DLLs unless they are linked against by an extension and we can resolve the DLL path at the analysis time.

That's why we had a hook for Shapely that was collecting the DLLs even before they moved to delvewheel (I think the geos_c DLL is loaded at runtime, so our dependency analysis did not pick it up at all). On the other hand, the av hook did not need to explicitly collect DLLs, because prior to moving to delvewheel, they had DLLs placed inside the package and next to the extensions that were linked against them. If extensions were not linked against the DLLs (but rather loaded them at runtime) or if the DLLs were placed in for example a sub-directory in the package, then the hook would also need to collect them (using our collect_dynamic_libs utility function).

Now, our main utility functions for collecting data and shared libs, collect_data_files and collect_dynamic_libs, assume that the files are contained within the package's directory.

So from my perspective, delvewheel complicates the situation on two counts:

  • whereas previously the DLLs may have been automatically picked up due to being next to extensions that need them, they are now placed somewhere else, so we need a hook to handle that
  • even for cases where hook was already collected DLLs that were previously inside the package directory, that does not work anymore due to external library location. And our existing utility functions cannot be used here due to their assumption that files to be collected are inside the package's directory.

Based on #394 and #408, I think we're going to need a new hook helper for handling delvewheel; namely, a function that accepts a package name and a name of DLL directory, and then checks for that directory in package's parent directory, and collects all DLLs and the load order file. This would still require hooks for all affected packages (sort of continuing the existing situation), but I don't see a way out of that, unless we automatically started to collect package distribution's contents based on the metadata.

rokm avatar Apr 29 '22 10:04 rokm

I see two ways to address this automatically and both are horrible:

  1. delvewheel additionally modifies wheels to generate and inject a hook, add a helper to locate it then adds a pyinstaller40 entry-point to locate that helper (see our inadequate docs on the process).
  2. PyInstaller does does bytecode scanning to detect functions matching _delvewheel_init_patch_*_*_*(), calls then in isolated subprocesses, then I think delvewheel will need to give PyInstaller a hand here and make those functions return something that gives PyInstaller a hint where the libraries it needs are.

bwoodsend avatar Apr 29 '22 13:04 bwoodsend

Hi, I noticed the introduction of the collect_delvewheel_libs_directory() function in https://github.com/pyinstaller/pyinstaller/commit/9373f07c7724d3bc7a55da9b36a251f5965e3b51. Would it be helpful if I were to have delvewheel inject a hook that calls this function whenever it repairs a wheel? Does that sound like a reasonable long-term solution?

adang1345 avatar Oct 26 '22 18:10 adang1345

Hi, I noticed the introduction of the collect_delvewheel_libs_directory() function in pyinstaller/pyinstaller@9373f07. Would it be helpful if I were to have delvewheel inject a hook that calls this function whenever it repairs a wheel? Does that sound like a reasonable long-term solution?

For this to work, delvewheel would have to additionally modify the wheel and generate a PyInstaller hook, and then register it using the pyinstaller40 entry-point. (This is basically what Brénainn described under point 1 in https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/421#issuecomment-1113289308).

Right now, I don't think this is necessary. In fact, with DLL path preservation on Windows, introduced in pyinstaller/pyinstaller#7028, and DLL search path extension from pyinstaller/pyinstaller#6925, I think we should be able to automatically pick up the shared libs in external .libs directory, as long as the package's extensions are dynamically linked against them (i.e., they are not loaded at runtime via ctypes or equivalent). The exception is python 3.7, where load order file needs to be collected, but I expect the support for 3.7 to be phased out soon, so long-term, that's not an issue.

So I personally prefer if we continue to handle issues with delvewheel in our hooks (using the newly introduced helper), mainly because it gives us more control over how we handle things and also gives us flexibility to change the approach if the need arises. This goes against the early-4.0-series paradigm of trying to push the hooks into their corresponding packages, but I think keeping the hooks on our side (either the main or this repository) is better, simply because we have more experience with PyInstaller packaging and the knowledge about its internals.


One thing I would ask you, though, is to guard the os.add_dll_directory() call in delvewheel-injected init function with a check that the directory exists, so it does not raise an error when it does not. Under earlier PyInstaller versions, we might have collected those DLLs in the top-level application directory, which would have made them available at run-time, but that unguarded call raised an error due to the external libs directory missing.

rokm avatar Oct 28 '22 12:10 rokm

One thing I would ask you, though, is to guard the os.add_dll_directory() call in delvewheel-injected init function with a check that the directory exists, so it does not raise an error when it does not. Under earlier PyInstaller versions, we might have collected those DLLs in the top-level application directory, which would have made them available at run-time, but that unguarded call raised an error due to the external libs directory missing.

I have submitted https://github.com/adang1345/delvewheel/commit/419a2fe77422ca81f350bf1a01e11e66aad59493 to address this.

adang1345 avatar Nov 02 '22 00:11 adang1345