vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] Exception in pyinstaller bundled app for vaex >=4.6.0

Open schwingkopf opened this issue 2 years ago • 8 comments

Description

I'm facing two exceptions when using latest vaex versions (4.6.0 and 4.7.0) after bundling using pyinstaller 4.7.

First exception

Traceback (most recent call last):
  File "main.py", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "vaex\__init__.py", line 43, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "vaex\dataset.py", line 13, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "frozendict\__init__.py", line 22, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\git_repos\\pyinstaller_problem2_minmal\\dist\\main\\frozendict\\VERSION'

It's caused by the VERSION file of frozendict (new in versions >2.0) not being bundled. That's actually a pyinstaller/frozendict issue. I just wanted to post the solution here as others will likely face the same issue. It can be solved by using the following hook file:

hook-frozendict.py:

from pathlib import Path
import frozendict

datas = [(Path(frozendict.__path__[0]) / 'VERSION', 'frozendict')]

Second exception

Hello world
Traceback (most recent call last):
  File "main.py", line 6, in <module>
  File "vaex\dataframe.py", line 928, in count
  File "vaex\dataframe.py", line 902, in _compute_agg
  File "vaex\dataframe.py", line 1672, in _delay
  File "vaex\dataframe.py", line 412, in execute
  File "vaex\execution.py", line 181, in execute
  File "vaex\execution.py", line 186, in run
  File "vaex\asyncio.py", line 51, in just_run
  File "nest_asyncio.py", line 81, in run_until_complete
  File "asyncio\futures.py", line 181, in result
  File "asyncio\tasks.py", line 249, in __step
  File "vaex\execution.py", line 334, in execute_async
  File "vaex\memory.py", line 37, in create_tracker
ValueError: No memory tracker found with name default
[1272] Failed to execute script 'main' due to unhandled exception!

For this one I have not found a solution yet and would like to query help. I have trouble understanding how the embedded importing in vaex/memory.py works (and I guess so does pyinstaller). Any hints how to solve this? Thats the concerned code section from vaex/memory.py

def create_tracker():
    memory_tracker_type = vaex.settings.main.memory_tracker.type
    if not _memory_tracker_types:
        with lock:
            if not _memory_tracker_types:
                for entry in pkg_resources.iter_entry_points(group="vaex.memory.tracker"):
                    _memory_tracker_types[entry.name] = entry.load()
    cls = _memory_tracker_types.get(memory_tracker_type)
    if cls is not None:
        return cls()
    raise ValueError(f"No memory tracker found with name {memory_tracker_type}")

Steps to reproduce are the following:

main.py:

import vaex

print("Hello world")

df = vaex.from_dict({'A':[1,2,3]})
print(df.count())

Executing python main.py, the script runs fine.

Bundle using pyinstaller 4.7 (having above mentioned hook-frozendict.py): pyinstaller --onedir --additional-hooks-dir=. main.py

Output of main.exe is:

Hello world
Traceback (most recent call last):
  File "main.py", line 6, in <module>
  File "vaex\dataframe.py", line 928, in count
  File "vaex\dataframe.py", line 902, in _compute_agg
  File "vaex\dataframe.py", line 1672, in _delay
  File "vaex\dataframe.py", line 412, in execute
  File "vaex\execution.py", line 181, in execute
  File "vaex\execution.py", line 186, in run
  File "vaex\asyncio.py", line 51, in just_run
  File "nest_asyncio.py", line 81, in run_until_complete
  File "asyncio\futures.py", line 181, in result
  File "asyncio\tasks.py", line 249, in __step
  File "vaex\execution.py", line 334, in execute_async
  File "vaex\memory.py", line 37, in create_tracker
ValueError: No memory tracker found with name default
[1272] Failed to execute script 'main' due to unhandled exception!

Software information

  • Vaex version (import vaex; vaex.__version__): {'vaex-core': '4.7.0'}
  • Vaex was installed via: pip
  • Python: 3.7.9
  • Pyinstaller: 4.7
  • OS: Win10

schwingkopf avatar Jan 12 '22 22:01 schwingkopf

Hi,

thanks for sharing this. I think pyinstaller is not picking up entry points for some reason. Those are listed in https://github.com/vaexio/vaex/blob/1b04e089a60d838362aad71ee4fdef9dc6e174be/packages/vaex-core/setup.py#L185 Does this help you?

Regards,

Maarten Breddels

maartenbreddels avatar Jan 13 '22 07:01 maartenbreddels

I'm having the same error. What you're mentioning is included in entry_points.txt but I don't know how to solve that.

styliann-eth avatar Feb 18 '22 16:02 styliann-eth

I succeeded in following the hints from https://github.com/pyinstaller/pyinstaller/issues/3050 and added the following to my .spec file:

# Helper function to make iter_entry_points work e.g. for vaex
# copied and modified from https://github.com/pyinstaller/pyinstaller/issues/3050
def prepare_entrypoints(ep_packages): 
    
    hook_ep_packages = dict()
    hiddenimports = set()
    runtime_hooks = list()
    
    if not ep_packages:
        return list(hiddenimports), runtime_hooks
        
    for ep_package in ep_packages:
        for ep in pkg_resources.iter_entry_points(ep_package):
            if ep_package in hook_ep_packages:
                package_entry_point = hook_ep_packages[ep_package]
            else:
                package_entry_point = []
                hook_ep_packages[ep_package] = package_entry_point
            package_entry_point.append("{} = {}:{}".format(ep.name, ep.module_name, ep.attrs[0]))
            hiddenimports.add(ep.module_name)

    try:
        os.mkdir('./generated')
    except FileExistsError:
        pass

    with open("./generated/pkg_resources_hook.py", "w") as f:
        f.write("""# Runtime hook generated from spec file to support pkg_resources entrypoints.
ep_packages = {}

if ep_packages:
    import pkg_resources
    default_iter_entry_points = pkg_resources.iter_entry_points

    def hook_iter_entry_points(group, name=None):
        if group in ep_packages and ep_packages[group]:
            eps = ep_packages[group]
            for ep in eps:
                parsedEp = pkg_resources.EntryPoint.parse(ep)
                parsedEp.dist = pkg_resources.Distribution()
                yield parsedEp
        else:
            return default_iter_entry_points(group, name)

    pkg_resources.iter_entry_points = hook_iter_entry_points
""".format(hook_ep_packages))
    
    runtime_hooks.append("./generated/pkg_resources_hook.py")
    
    return list(hiddenimports), runtime_hooks

# List of packages that should have their "Distutils entrypoints" included.
ep_packages = ["vaex.memory.tracker"]

hiddenimports, runtime_hooks = prepare_entrypoints(ep_packages)

and then add the hiddenimports and runtime_hooks to the arguments of Analysis like so:

a = Analysis(
    ...
    hiddenimports=hiddenimports,
    runtime_hooks=runtime_hooks,
)

Hope that helps

schwingkopf avatar Feb 18 '22 17:02 schwingkopf

I am using Auto py to exe GUI and facing the same issue Exception in Tkinter callback ... Can someone help how to resolve it using GUI

Traceback (most recent call last):
  File "tkinter\__init__.py", line 1702, in __call__
  File "KPI_Automation_GUI.py", line 302, in startConversion
    startConversion_mf4()
  File "KPI_Automation_GUI.py", line 215, in startConversion_mf4
    match_extract_txt.match_and_Extract(textfilelist, str(DriveEnv.get()))
  File "match_extract_txt.py", line 2142, in match_and_Extract
    df.export_hdf5(Databasehdf5FilePath_temp, progress=True, chunk_size=1000000, parallel=True, mode='w')
  File "vaex\dataframe.py", line 6907, in export_hdf5
    with vaex.utils.progressbars(progress, title="export(hdf5)") as progressbar:
  File "vaex\utils.py", line 988, in progressbars
    return tree(*args, **kwargs)
  File "vaex\progress.py", line 206, in tree
    return ProgressTree(bar=bar(title=title), next=next, name=name)
  File "vaex\progress.py", line 181, in bar
    return _progressbar_registry[type_name](title=title)
  File "vaex\utils.py", line 75, in __getitem__
    raise NameError(f'No {self.typename} registered with name {name!r} under entry_point {self.entry_points!r}')
NameError: No progressbar registered with name 'simple' under entry_point 'vaex.progressbar'

rajeebdash avatar May 31 '22 09:05 rajeebdash

It seems Python 3.10 breaks the fix above for PyInstaller due to the new importlibs.metadata being used instead. For now however, I've fixed this for my own project by editing dataset.py and memory.py to add the code from the generated python hook and set entry_points to the hook function. utils.py may also need to be overidden in some use cases, but that didn't turn out to be needed for bare hdf5/csv access.

If any hidden imports are missing after using the entry points fix above, they can be identified by using something like this:

        modlist=open('modules.txt','w')
        print(json.dumps(sorted(list(sys.modules.keys())), indent=4),file=modlist)
        modlist.close()

and doing a diff between the running python version and compiled exe version, then filtering the results. Its possible simply changing the import in the hook file may fix the problem, but I haven't tested that possibility yet, as it was unclear whether the direct import of entry_points would override the hook when the module was imported a second time. Either way, combining the fix given with one of these two possibilities will yield a working app. To prevent breaking the development process, I just installed a 3.10 parallel to the development environment for the build so that it doesn't matter if the installed files are edited.

EDIT: Adding for those trying to build apps relying on vaex-viz, the lazy accessors are set in init.py, applying the same monkey patch for vaex.dataframe.accessor and vaex.expression.accessor in init.py will resolve the problem.

leprechaunt33 avatar Jan 15 '23 02:01 leprechaunt33

Based on the discussion above and looking in the related issues I have not been able to find a solution to this problem. I'm running Vaex 4.16 on python 3.10 in conda with the following basic example:

`import vaex as vx from vaex.hdf5.dataset import Hdf5MemoryMapped, AmuseHdf5MemoryMapped, Hdf5MemoryMappedGadget vx.settings.main.memory_tracker.type = 'default'

vx.dataset.opener_classes = [Hdf5MemoryMapped,AmuseHdf5MemoryMapped,Hdf5MemoryMappedGadget]

df = vx.open(r"c:\20220613.hdf5") print(df.head) df.select(df['date'] >= "2022-06-01", mode='and' ) print("count: ", df.count(selection=True)) df.select(df['starttime'] >="2022-06-13 14:00:00" , mode='and' ) print("count: ", df.count(selection=True)) df.close() ` This will give the correct result when running as a script. When running as an executable it gives the correct output for the df.head but the df.count results in the memory tracker issue as mentioned in this thread. I would really appreciate some help solving this as I am currently not able to package my solution as an executable.

intelligibledata avatar Jun 12 '23 12:06 intelligibledata

It's been more than a year. Does anyone know the solution? Thank you in advance.

fqking avatar Oct 08 '23 07:10 fqking

pyinstaller -hidden-import vaex.viz --hidden-import vaex.astro.legacy --recursive-copy-metadata vaex fixed issues with Vaex 4.17 on Python 3.10

gostdi avatar Oct 10 '23 16:10 gostdi