dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

PyInstaller Compatibility Issue: Missing dateparser_tz_cache.pkl in Frozen Applications

Open roger-zhangg opened this issue 6 months ago • 6 comments

Summary

#1250 introduced a new data file dateparser/data/dateparser_tz_cache.pkl that is not automatically included when freezing applications with PyInstaller, causing runtime failures in previously working deployments.

Problem Description

The introduction of the timezone cache file dateparser/data/dateparser_tz_cache.pkl in #1250 has created a breaking change for users who package their applications using PyInstaller. This file is not automatically detected and included by PyInstaller's dependency analysis, resulting in FileNotFoundError exceptions when the frozen application attempts to access the cache file.

  • https://github.com/aws/aws-sam-cli/issues/8139
  • https://github.com/aws/aws-sam-cli/issues/8140
  • https://github.com/aws/aws-sam-cli/issues/8141
  • https://github.com/aws/aws-sam-cli/issues/8143

Steps to Reproduce

  1. Create a Python application that uses dateparser
  2. Package the application using PyInstaller:
    pyinstaller --onefile your_app.py
    
  3. Run the frozen executable
  4. Observe the runtime error when dateparser attempts to access the cache file

Expected Behavior

The frozen application should work without additional configuration, maintaining backward compatibility with existing PyInstaller workflows.

Actual Behavior

The application fails at runtime with a FileNotFoundError when trying to access dateparser/data/dateparser_tz_cache.pkl.

Current Workaround

Users must manually configure PyInstaller to include the data file by:

  1. Creating a PyInstaller hook file or
  2. Using the --add-data flag:
    pyinstaller --add-data "path/to/dateparser/data/dateparser_tz_cache.pkl:dateparser/data/" your_app.py
    
  3. Or adding the file to a custom hook in their build process
  4. Or revert back to 1.2.1

Impact Assessment

This change breaks existing production deployments and CI/CD pipelines that use PyInstaller without warning. Users upgrading dateparser may experience unexpected deployment failures.

Suggested Solutions

Short-term

  1. Documentation Update: Add clear instructions in the README and documentation about PyInstaller compatibility requirements
  2. PyInstaller Hook: Include a proper PyInstaller hook file (hook-dateparser.py) in the package to automatically handle data file inclusion
  3. Runtime Fallback: Implement graceful fallback behavior when the cache file is missing

Long-term

  1. Version Bump Consideration: This type of breaking change typically warrants a major version increment to signal potential compatibility issues to users

Proposed PyInstaller Hook

# hook-dateparser.py
from PyInstaller.utils.hooks import collect_data_files

datas = collect_data_files('dateparser')

Environment

  • dateparser version: 1.2.2
  • PyInstaller version: [various versions affected]
  • Python version: [various versions]
  • Operating System: Linux

roger-zhangg avatar Jul 08 '25 00:07 roger-zhangg

Isn’t there a way to fix this that is not PyInstaller-specific? I wonder why https://github.com/scrapinghub/dateparser/blob/f69e9b2e11c81ded87ec80956bcf42c297e9366c/MANIFEST.in#L6 is not enough for PyInstaller to include the file.

Gallaecio avatar Jul 08 '25 05:07 Gallaecio

I assume a pyinstaller hook like below can collect the datafile through MANIFEST.in

from PyInstaller.utils import hooks
data = (
    hooks.collect_data_files("dateparser")
)

However this hook is not required in 1.2.1

roger-zhangg avatar Jul 08 '25 16:07 roger-zhangg

I see PyInstaller maintainers decided against doing this automatically, and they seem to want packages to provide hooks.

I am not against it, but I don’t plan to work on this myself. If someone provides a PR, I’ll review it. The PR should also include a test that would have caught this issue. We can add a CI job to test PyInstaller integration.

Gallaecio avatar Jul 08 '25 16:07 Gallaecio

The pickle file should be generated on first import, not committed and distributed. Currently when there is a new regex package version it breaks dateparser in a very subtle way (no exception, just strings don't match where they should or vice versa).

gsakkis avatar Jul 30 '25 11:07 gsakkis

Generating on the first import would not solve that, you could install a new regex version after that.

Either we find a way to avoid the regex issue, or we revert the pickle-based import-time speed boost.

We could also have both, i.e. vendor a pickle file to speed up compatible scenarios right away; if an incompatible regex is found at run time, generate a new pickle file in some writable folder, that takes higher priority than the vendored file, and use that.

Gallaecio avatar Jul 30 '25 12:07 Gallaecio

you could install a new regex version after that.

Right but that's a much less common scenario that installing dateparser on a fresh virtualenv and having an incompatible vendored regex. Not sure what was the motivation for https://github.com/scrapinghub/dateparser/pull/1250 (i.e. why it's important to shave off a few seconds the very first time dateparser is imported after installation, as opposed to every import after the first time) and not objecting to the proposed solutions, just suggesting that not vendoring the pickle file, generating it on first import and not doing any runtime checks after that would cover 99% of use cases.

gsakkis avatar Jul 30 '25 17:07 gsakkis