Installing files should NOT preserve timestamps
This issue is meant as a pointer to https://github.com/python/cpython/issues/76954 so please discuss it there.
When a Python project is installed, distutils (which is also used by other tools like pip) copies the files from the build to install directory using the copy_file() function. In this copy operation, timestamps are preserved. In other words, the timestamp of the installed file equals the timestamp of the source file.
By contrast, autotools does not preserve timestamps: the timestamp of the installed files equals the time of installation. This makes more sense because of dependency checking: if you reinstall a package, you typically want to rebuild everything depending on that package.
This issue is particularly relevant for installing .h files: most build systems (including distutils itself) provide a way to recompile C/C++ source files if they depend on a changed header file. But that only works if the timestamp of the header is updated when it is installed.
Note that distutils/command/build_py.py contains a comment
# XXX copy_file by default preserves atime and mtime. IMHO this is
# the right thing to do, but perhaps it should be an option -- in
# particular, a site administrator might want installed files to
# reflect the time of installation rather than the last
# modification time before the installed release.
but without any justification.
In my opinion, there should not be an option. The current behaviour is simply wrong and should be fixed.
Is this still an issue with standard wheel installation?
With wheels there shouldn't be any header or source files, so I kind of suspect atime/mtime on wheel contents doesn't matter for build system invalidation. But maybe this is still an issue with sdists? I don't know what setuptools does here 😅
I don’t know if setuptools still installs things! pip (and other build frontends?) builds a wheel from an sdist and installs it. I forget if the wheel spec or the build backend-frontend specs define behaviour around timestamps here.
@pfmoore can I bug you here and ask if you think this issue still has merit?
Wheels can still include header files.
I'm personally inclined to think that the current behaviour is fine, though - the timestamp in the wheel is presumably the last time the header was changed, so it is a correct reflection of whether the header has changed or not.
If anything were to be done these days, it would require a change to the wheel spec defining what installers must do regarding timestamps. At the moment the spec doesn't say, so you have to assume that the installer can choose what behaviour they want to implement.
On the other hand, if the question is now what timestamps should be recorded in the wheel, then there's no standards around that - it would be something for individual build backends to decide (and this issue should be raised against the user's chosen build backend).
I think builders generally normalize timestamps. Scikit-build-core will do so for the SDist unless you set sdist.reproducible to False. This makes the SDists reproducible. Wheels are not fully reproducible yet (neither are they in setuptools), so a setting hasn't been added there - generally SOURCE_DATE_EPOCH is used, and support for that is somewhat (CMake 3.8, GCC 7, clang 16) recent, I think. https://reproducible-builds.org/docs/source-date-epoch/
Note that in https://github.com/pypa/pip/pull/13681, pip is considering changing its behaviour to preserve the timestamps of files installed from wheels.
The current behaviour is simply wrong and should be fixed.
@jdemeyer are you aware of specific problems that this behavior is causing downstream within the Python ecosystem? The PR mentioned above came about because the original issue references a problem in fedora-reproducible-builds that is caused by pip not preserving the mtime when installing files. The resulting mtimes are not stable, which causes knock-on instability. If there are known issues downstream on the other side of this discussion, it will help to have them surfaced here.
I can believe that either side of this choice is going to cause problems, but I'd like to be fully informed about the specifics of those problems. Without the specifics, I'm inclined to think that a stable behavior is going to cause fewer problems than one where the result varies every time.
I think it's also worth broadening this discussion beyond "timestamps", because there are multiple such values. I think for instance that it makes sense to preserve mtime if a file's contents have not changed, but I can see the argument for reflecting time-of-install in atime and ctime (I believe we're getting this behavior for free with that pip PR, but it's something of a side effect).
I'm of the opinion that when unpacking a binary distribution, it makes sense to preserve mtime, but when performing a build from a source distribution, all timestamps should reflect the time of the actual build.