pyinstaller-hooks-contrib
pyinstaller-hooks-contrib copied to clipboard
How to handle wheels repaired by `delvewheel`
Hi, I'm the maintainer of https://github.com/adang1345/delvewheel, which is a tool that bundles external DLLs into wheels. I have seen reports that whenever pyinstaller
is used to create an executable that depends on a wheel that was created by delvewheel
, the executable does not run properly because DLLs are no longer present in the expected location.
For context, see https://github.com/adang1345/delvewheel/issues/18, which contains some details about what the problem is. https://github.com/edenhill/librdkafka/issues/3484 may also provide some information.
I also noticed https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/388, which is an issue that was raised because the Shapely
library began to use delvewheel
. That problem was fixed by updating the Shapely
hook for pyinstaller
.
I'm wondering whether there's a better way to have pyinstaller
work with wheels that were created by delvewheel
other than having a pyinstaller
hook for every library that decides to use delvewheel
. Any ideas are welcome.
With the current PyInstaller design, I think having hooks for packages is somewhat unavoidable, even if we ignore delvewheel
for a bit. That's because PyInstaller does not automatically collect arbitrary data files (e.g., DLL load order list) from a package, nor does it collect bundled DLLs unless they are linked against by an extension and we can resolve the DLL path at the analysis time.
That's why we had a hook for Shapely
that was collecting the DLLs even before they moved to delvewheel
(I think the geos_c
DLL is loaded at runtime, so our dependency analysis did not pick it up at all). On the other hand, the av
hook did not need to explicitly collect DLLs, because prior to moving to delvewheel
, they had DLLs placed inside the package and next to the extensions that were linked against them. If extensions were not linked against the DLLs (but rather loaded them at runtime) or if the DLLs were placed in for example a sub-directory in the package, then the hook would also need to collect them (using our collect_dynamic_libs
utility function).
Now, our main utility functions for collecting data and shared libs, collect_data_files
and collect_dynamic_libs
, assume that the files are contained within the package's directory.
So from my perspective, delvewheel
complicates the situation on two counts:
- whereas previously the DLLs may have been automatically picked up due to being next to extensions that need them, they are now placed somewhere else, so we need a hook to handle that
- even for cases where hook was already collected DLLs that were previously inside the package directory, that does not work anymore due to external library location. And our existing utility functions cannot be used here due to their assumption that files to be collected are inside the package's directory.
Based on #394 and #408, I think we're going to need a new hook helper for handling delvewheel
; namely, a function that accepts a package name and a name of DLL directory, and then checks for that directory in package's parent directory, and collects all DLLs and the load order file. This would still require hooks for all affected packages (sort of continuing the existing situation), but I don't see a way out of that, unless we automatically started to collect package distribution's contents based on the metadata.
I see two ways to address this automatically and both are horrible:
-
delvewheel
additionally modifies wheels to generate and inject a hook, add a helper to locate it then adds apyinstaller40
entry-point to locate that helper (see our inadequate docs on the process). - PyInstaller does does bytecode scanning to detect functions matching
_delvewheel_init_patch_*_*_*()
, calls then in isolated subprocesses, then I thinkdelvewheel
will need to give PyInstaller a hand here and make those functions return something that gives PyInstaller a hint where the libraries it needs are.
Hi, I noticed the introduction of the collect_delvewheel_libs_directory()
function in https://github.com/pyinstaller/pyinstaller/commit/9373f07c7724d3bc7a55da9b36a251f5965e3b51. Would it be helpful if I were to have delvewheel
inject a hook that calls this function whenever it repairs a wheel? Does that sound like a reasonable long-term solution?
Hi, I noticed the introduction of the
collect_delvewheel_libs_directory()
function in pyinstaller/pyinstaller@9373f07. Would it be helpful if I were to havedelvewheel
inject a hook that calls this function whenever it repairs a wheel? Does that sound like a reasonable long-term solution?
For this to work, delvewheel
would have to additionally modify the wheel and generate a PyInstaller hook, and then register it using the pyinstaller40
entry-point. (This is basically what Brénainn described under point 1 in https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/421#issuecomment-1113289308).
Right now, I don't think this is necessary. In fact, with DLL path preservation on Windows, introduced in pyinstaller/pyinstaller#7028, and DLL search path extension from pyinstaller/pyinstaller#6925, I think we should be able to automatically pick up the shared libs in external .libs directory, as long as the package's extensions are dynamically linked against them (i.e., they are not loaded at runtime via ctypes
or equivalent). The exception is python 3.7, where load order file needs to be collected, but I expect the support for 3.7 to be phased out soon, so long-term, that's not an issue.
So I personally prefer if we continue to handle issues with delvewheel in our hooks (using the newly introduced helper), mainly because it gives us more control over how we handle things and also gives us flexibility to change the approach if the need arises. This goes against the early-4.0-series paradigm of trying to push the hooks into their corresponding packages, but I think keeping the hooks on our side (either the main or this repository) is better, simply because we have more experience with PyInstaller packaging and the knowledge about its internals.
One thing I would ask you, though, is to guard the os.add_dll_directory()
call in delvewheel
-injected init function with a check that the directory exists, so it does not raise an error when it does not. Under earlier PyInstaller versions, we might have collected those DLLs in the top-level application directory, which would have made them available at run-time, but that unguarded call raised an error due to the external libs directory missing.
One thing I would ask you, though, is to guard the
os.add_dll_directory()
call indelvewheel
-injected init function with a check that the directory exists, so it does not raise an error when it does not. Under earlier PyInstaller versions, we might have collected those DLLs in the top-level application directory, which would have made them available at run-time, but that unguarded call raised an error due to the external libs directory missing.
I have submitted https://github.com/adang1345/delvewheel/commit/419a2fe77422ca81f350bf1a01e11e66aad59493 to address this.