hatch icon indicating copy to clipboard operation
hatch copied to clipboard

Junctions/symbolic links can cause directories to be missing from source distributions

Open thegamecracks opened this issue 1 year ago • 0 comments

Summary

When a directory to be included by hatch is referred to by a directory junction/symbolic link that is seen before the directory, it will be unexpectedly missing from new source distributions.

Steps to Reproduce

  1. Setup a project with the following structure:

    • foo/__init__.py (can be empty)

    • pyproject.toml

      [build-system]
      requires = ["hatchling"]
      build-backend = "hatchling.build"
      
      [project]
      name = "foo"
      version = "1.0.0"
      
      [tool.hatch.build.targets.sdist]
      include = ["foo"]
      
  2. Create a directory junction / symbolic link to the package with any name sorted before the package name itself:

    # Windows:
    mklink /J bar foo
    # Linux:
    ln -sT foo bar
    
  3. Attempt to build a source distribution:

    hatch build --target sdist
    # or:
    pip install build
    python -m build --sdist
    

The resulting dist/foo-1.0.0.tar.gz archive will be missing the foo/ package, which should have been included. When bar is not present, or it has a name that comes after foo, like foo2, the foo package will be included in the archive.

Additional context

After some debugging, the following stack appears to be the cause of this issue:

https://github.com/pypa/hatch/blob/6b12353590154d08cf48b55983286113b8448409/backend/src/hatchling/builders/plugin/interface.py#L185-L186

https://github.com/pypa/hatch/blob/6b12353590154d08cf48b55983286113b8448409/backend/src/hatchling/builders/utils.py#L21-L31

From my understanding, because os.stat() follows the bar/ junction / symlink to foo/ by default, and bar/ is returned first by os.walk(), the actual foo/ gets skipped by the safe_walk() function and prevents the builder interface from knowing of its existence.

As for how I discovered this, I was experimenting with different project structures for reactpy, and one of those layouts was:

src/
  py/
    reactpy/
      js/
        node_modules/
          @reactpy/client/
          event-to-object/
        packages/
          @reactpy/client/
          event-to-object/
        package.json
      reactpy/    # python package
      .gitignore  # ignores node_modules
      pyproject.toml

The source code in js/packages/ needed to be included in the source distribution so it could be built by hatch-build-scripts, and of course node_modules/ was to be excluded from the sdist. However, it turns out that node_modules/ contained directory junctions to the package directories which caused hatch to not include them in the sdist. I was not aware that these modules were interfering with the building of the source distribution, nor did I think it would cause any interference because I was using hatch's .gitignore support to filter out node_modules/. Eventually I realized that removing node_modules/ magically solved the issue, but dived deeper into hatchling's source code to try figuring out what was going on.

thegamecracks avatar Jan 04 '24 07:01 thegamecracks