cibuildwheel
cibuildwheel copied to clipboard
Issue with file permissions in tests on Linux GHA build
Description
In h5py tests, we do:
os.chmod(fname, stat.S_IREAD) # Make file read-only
where fname comes from tempfile.mktemp(..., dir=None), and then check that trying to append to this file raises a PermissionError. However, on the Linux build on GHA (only) it does not. This was found as part of a migration from Azure over to GHA -- Azure did not seem to have this problem. None of our other non-cibuildwheel CIs have shown this problem.
Is it possible that this is related to the containerized structure where permissions aren't set on temp files properly for some reason? With a debug statement from https://github.com/h5py/h5py/pull/2444/commits/4e6738802a0554912a40dc5fa7bb1842cef1394a, though, I can see that at least Python thinks the permissions have changed properly from 33188 to 33024.
We do also use
manylinux-x86_64-image = "ghcr.io/h5py/manylinux2014_x86_64-hdf5"
manylinux-aarch64-image = "ghcr.io/h5py/manylinux2014_aarch64-hdf5"
so maybe we have some issue with our custom images somehow.
So not 100% sure this is a cibuildwheel problem but wanted to open the issue in case it is.
Build log
https://github.com/h5py/h5py/actions/runs/9517645998/job/26236715157?pr=2444#step:11:2400
CI config
https://github.com/larsoner/h5py/blob/gha/.github/workflows/build_wheels.yml
I can't think what cibuildwheel would be doing that would affect this. We don't vary how docker is invoked between GHA and Azure - cibuildwheel treats them the same. So I'd guess the issue is on the GHA side.
Perhaps Github is using some kind of special Docker runner/kernel where root ignores read-only attributes? Does the test failure only occur inside the cibuildwheel Docker container, or does it also appear when running tests in GHA natively?
Does the test failure only occur inside the cibuildwheel Docker container, or does it also appear when running tests in GHA natively?
Over in my fork of HDF5 in larsoner:permcheck I now build the wheel, passing through the env var CIBUILDWHEEL which then makes it get marked as xfail during the cibuildwheel test stage:
https://github.com/larsoner/h5py/blob/4d32d38978b174b76a553aab6c5c78b25094f082/h5py/tests/test_file.py#L104-L116
You can see in the GitHub actions run the cibuildwheel test does hit the xfail in TestFileOpen.test_append_permissions:
https://github.com/larsoner/h5py/actions/runs/9785287457/job/27018013077#step:11:9744 https://github.com/larsoner/h5py/actions/runs/9785287457/job/27018013077#step:11:9802
But then in a second job I download the wheel and run the same tests in the native GHA context, and it passes just fine (not marked as xfail and does not fail... until the end of the job when codecov fails to upload which is unrelated):
https://github.com/larsoner/h5py/actions/runs/9785287457/job/27018103570#step:7:81
[me:] Azure did not seem to have this problem.
This part I was wrong about -- the testing mechanism on Azure wasn't through standard cibuildwheel TEST_ mechanisms but rather through a separate Azure job that ran using native Azure UsePythonVersion@ (i.e., not via a docker container). So it might be a general docker container problem.
So I'd guess the issue is on the GHA side.
A tiny bit of reading I came across https://github.com/actions/runner/issues/1282#issuecomment-1063974931 but I'm not convinced it's related... my docker knowledge is limited. Feel free to close if you think it's an upstream issue!