PyInstaller Compatibility Issue: Missing dateparser_tz_cache.pkl in Frozen Applications
Summary
#1250 introduced a new data file dateparser/data/dateparser_tz_cache.pkl that is not automatically included when freezing applications with PyInstaller, causing runtime failures in previously working deployments.
Problem Description
The introduction of the timezone cache file dateparser/data/dateparser_tz_cache.pkl in #1250 has created a breaking change for users who package their applications using PyInstaller. This file is not automatically detected and included by PyInstaller's dependency analysis, resulting in FileNotFoundError exceptions when the frozen application attempts to access the cache file.
- https://github.com/aws/aws-sam-cli/issues/8139
- https://github.com/aws/aws-sam-cli/issues/8140
- https://github.com/aws/aws-sam-cli/issues/8141
- https://github.com/aws/aws-sam-cli/issues/8143
Steps to Reproduce
- Create a Python application that uses dateparser
- Package the application using PyInstaller:
pyinstaller --onefile your_app.py - Run the frozen executable
- Observe the runtime error when dateparser attempts to access the cache file
Expected Behavior
The frozen application should work without additional configuration, maintaining backward compatibility with existing PyInstaller workflows.
Actual Behavior
The application fails at runtime with a FileNotFoundError when trying to access dateparser/data/dateparser_tz_cache.pkl.
Current Workaround
Users must manually configure PyInstaller to include the data file by:
- Creating a PyInstaller hook file or
- Using the
--add-dataflag:pyinstaller --add-data "path/to/dateparser/data/dateparser_tz_cache.pkl:dateparser/data/" your_app.py - Or adding the file to a custom hook in their build process
- Or revert back to 1.2.1
Impact Assessment
This change breaks existing production deployments and CI/CD pipelines that use PyInstaller without warning. Users upgrading dateparser may experience unexpected deployment failures.
Suggested Solutions
Short-term
- Documentation Update: Add clear instructions in the README and documentation about PyInstaller compatibility requirements
- PyInstaller Hook: Include a proper PyInstaller hook file (
hook-dateparser.py) in the package to automatically handle data file inclusion - Runtime Fallback: Implement graceful fallback behavior when the cache file is missing
Long-term
- Version Bump Consideration: This type of breaking change typically warrants a major version increment to signal potential compatibility issues to users
Proposed PyInstaller Hook
# hook-dateparser.py
from PyInstaller.utils.hooks import collect_data_files
datas = collect_data_files('dateparser')
Environment
- dateparser version: 1.2.2
- PyInstaller version: [various versions affected]
- Python version: [various versions]
- Operating System: Linux
Isn’t there a way to fix this that is not PyInstaller-specific? I wonder why https://github.com/scrapinghub/dateparser/blob/f69e9b2e11c81ded87ec80956bcf42c297e9366c/MANIFEST.in#L6 is not enough for PyInstaller to include the file.
I assume a pyinstaller hook like below can collect the datafile through MANIFEST.in
from PyInstaller.utils import hooks
data = (
hooks.collect_data_files("dateparser")
)
However this hook is not required in 1.2.1
I see PyInstaller maintainers decided against doing this automatically, and they seem to want packages to provide hooks.
I am not against it, but I don’t plan to work on this myself. If someone provides a PR, I’ll review it. The PR should also include a test that would have caught this issue. We can add a CI job to test PyInstaller integration.
The pickle file should be generated on first import, not committed and distributed. Currently when there is a new regex package version it breaks dateparser in a very subtle way (no exception, just strings don't match where they should or vice versa).
Generating on the first import would not solve that, you could install a new regex version after that.
Either we find a way to avoid the regex issue, or we revert the pickle-based import-time speed boost.
We could also have both, i.e. vendor a pickle file to speed up compatible scenarios right away; if an incompatible regex is found at run time, generate a new pickle file in some writable folder, that takes higher priority than the vendored file, and use that.
you could install a new regex version after that.
Right but that's a much less common scenario that installing dateparser on a fresh virtualenv and having an incompatible vendored regex. Not sure what was the motivation for https://github.com/scrapinghub/dateparser/pull/1250 (i.e. why it's important to shave off a few seconds the very first time dateparser is imported after installation, as opposed to every import after the first time) and not objecting to the proposed solutions, just suggesting that not vendoring the pickle file, generating it on first import and not doing any runtime checks after that would cover 99% of use cases.