matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

CI: Add a Cygwin run to GHA CI.

Open DWesl opened this issue 3 years ago • 13 comments

PR Summary

Add a Cygwin run to GHA, as suggested by https://github.com/matplotlib/matplotlib/issues/22997#issuecomment-1120064419

PR Checklist

Tests and Styling

  • [ ] Has pytest style unit tests (and pytest passes).
  • [ ] Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • [ ] New features are documented, with examples if plot related.
  • [ ] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [ ] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • [ ] Documentation is sphinx and numpydoc compliant (the docs should build without error).

DWesl avatar May 07 '22 00:05 DWesl

I think that libjpeg-devel is also required (as you probably have figured out as well).

oscargus avatar May 07 '22 08:05 oscargus

I can try that and see if it works (I shouldn't run into python-pillow/pillow#6216)

DWesl avatar May 07 '22 10:05 DWesl

The build is now failing in the expected place, so let's see if the suggestion in https://github.com/matplotlib/matplotlib/issues/22997#issuecomment-1120195190 allows the build to finish.

DWesl avatar May 07 '22 12:05 DWesl

The build is succeeding now, but the tests fail with ImageComparisonErrors due to using the wrong version of FreeType. Let's see if that compiles now.

DWesl avatar May 07 '22 18:05 DWesl

The freetype compile still fails during configure; the closest Cygwin build script I can find is here. The first patch doesn't look like the kind of thing to fix a configure crash (probably the same as this one), and the name of the second patch doesn't look relevant.

The only other thing I can think of is running autoconf/autoreconf.

DWesl avatar May 08 '22 11:05 DWesl

The Cygwin CI is still failing in FreeType configure: https://github.com/matplotlib/matplotlib/runs/6436739391?check_suite_focus=true#step:11:756

I can get the compilation done using system FreeType and QHull, but building the FreeType version needed for the tests to pass fails, either during the libtoolize step of autoreconf or during configure without that step.

Is there a way to skip all the tests that require an exact image match? It wouldn't provide a thorough test of Matplotlib on Cygwin, but most of the differences from Windows and Linux are in compiling and importing the extensions, so that might be enough.

DWesl avatar May 14 '22 22:05 DWesl

The bundled FreeType build fails even if I push through after the libtoolize failure. I can try again without the bundled FreeType, and see if I can eliminate the fork() failures, even if the ImageComparisonFailures stick around.

DWesl avatar Jun 08 '22 14:06 DWesl

I don't know why upload-artifact fails; the previous step should give everyone read and write permissions to everything in that directory and also execute permissions to every sub-directory.

DWesl avatar Jun 08 '22 17:06 DWesl

I haven't seen that error message on a fork() failure before:

      1 [main] python3.8 (5008) C:\cygwin\bin\python3.8.exe: *** fatal error in forked process - MEM_COMMIT failed, Win32 error 1455
  13465 [main] python3.8 (5008) cygwin_exception::open_stackdumpfile: Dumping stack trace to python3.8.exe.stackdump
2003466 [main] python 2011 dofork: child -1 - forked process 5008 died unexpectedly, retry 0, exit code 0x100, errno 11

The upload-artifact step is also still failing, and I have no idea why.

DWesl avatar Jun 27 '22 00:06 DWesl

Do Windows ReadAndExecute permissions not imply Read or ReadPermissions? I wouldn't think upload-artifact would need Write, ExecuteFile, or delete permissions.

DWesl avatar Jun 27 '22 18:06 DWesl

And now the FreeType build is failing with no apparent reason given.

DWesl avatar Jul 04 '22 17:07 DWesl

I do not understand all the extra work that needs to be done here. On Cygwin x86_64, I installed Python 3.9, gcc-c++, make, and I think python39-matplotlib, and was able to build FreeType just fine (with #23066). I did have trouble with downloading qhull, which I have not yet investigated.

QuLogic avatar Jul 26 '22 04:07 QuLogic

I was also able to build FreeType locally, but it still fails in CI, for reasons the error message does not seem to explain. Trying to narrow down the cause of that failure is the bulk of the commits.

DWesl avatar Jul 26 '22 22:07 DWesl

fatal error in forked process - MEM_COMMIT failed, Win32 error 1455

Well that's a new kind of fork() failure.

DWesl avatar Dec 01 '22 19:12 DWesl

The most recent commit should have stopped trying to install wxPython, so I'm not sure what to do about the six failing MPLBACKEND: wxagg tests

DWesl avatar Dec 03 '22 02:12 DWesl

From this cryptic (truncated!) message:

stderr = 'Traceback (most recent call last):\n  File "<string>", line 1, in <module>\n  File "/cygdrive/d/a/matplotlib/matplotl....8/site-packages/wx/core.py", line 12, in <module>\n    from ._core import *\nImportError: No such file or directory\n'

I think that wx is still installed enough to pass our "can we test this" filter (it looks like there is a wx/core.py is site-packages?!) but not enough to actually run (the _core (I assume) c-extensions are mssing).

https://github.com/matplotlib/matplotlib/blob/0aac9f17b9373f66c709f436149568bda69595b3/lib/matplotlib/tests/test_backends_interactive.py#L24-L68

This logic can not actually import the GUI toolkits because importing more than one can lead to all manner of trouble so we check with find_spec and then run the actual tests is a sub-process (sorry if I am over explaining obvious things).

tacaswell avatar Dec 05 '22 02:12 tacaswell

It looks like it is coming from the system packages?

  15 install python3-wx-common                      4.0.7.post2-4               
...
 401 install python38-wx                            4.0.7.post2-4           

https://github.com/matplotlib/matplotlib/actions/runs/3606263992/jobs/6077436300#step:4:104 https://github.com/matplotlib/matplotlib/actions/runs/3606263992/jobs/6077436300#step:4:490

tacaswell avatar Dec 05 '22 02:12 tacaswell

FAILED lib/matplotlib/tests/test_matplotlib.py::test_tmpconfigdir_warning - AssertionError: assert 'set the MPLCONFIGDIR' in ''
634
 +  where '' = CompletedProcess(args=['/usr/bin/python', '-c', 'import matplotlib'], returncode=0, stderr='').stderr
635
FAILED lib/matplotlib/tests/test_pyplot.py::test_pylab_integration - BlockingIOError: [Errno 11] Resource temporarily unavailable
636
FAILED lib/matplotlib/tests/test_simplification.py::test_throw_rendering_complexity_exceeded

The tempdir one is already skiped on nt, I assume cygwin has the same limitations and should be skipped?

The rendering complexity one may be an actual bug that is testing that c code bails on too-big things. https://github.com/matplotlib/matplotlib/issues/21789 seems related and the most recent PR to touch this code was https://github.com/matplotlib/matplotlib/pull/19343.

I have no good theory on the pylab one.

tacaswell avatar Dec 05 '22 04:12 tacaswell

FAILED lib/matplotlib/tests/test_matplotlib.py::test_tmpconfigdir_warning - AssertionError: assert 'set the MPLCONFIGDIR' in ''
634
 +  where '' = CompletedProcess(args=['/usr/bin/python', '-c', 'import matplotlib'], returncode=0, stderr='').stderr
635
FAILED lib/matplotlib/tests/test_pyplot.py::test_pylab_integration - BlockingIOError: [Errno 11] Resource temporarily unavailable
636
FAILED lib/matplotlib/tests/test_simplification.py::test_throw_rendering_complexity_exceeded

The tempdir one is already skipped on nt, I assume cygwin has the same limitations and should be skipped?

Cygwin implements its own permissions: to check I just ran the code from the test locally and got the expected warning. Extrapolating from this FAQ entry, I think whether the test could work would depend on whether the underlying filesystem is FAT/FAT32 (fails) or NTFS/NFS (can pass). There's a possibility there's default ACLs on one of the parent directories (implied by a different FAQ entry).

The other possibility is that the test for root isn't working since the user ID for "Administrator" even running as administrator is not 0 by default (see this user guide description). I could try changing that using mkpasswd, or try to look up how to tell if the current session has the relevant "Access files regardless of permissions" token.

The rendering complexity one may be an actual bug that is testing that c code bails on too-big things. #21789 seems related and the most recent PR to touch this code was #19343.

I have no good theory on the pylab one.

That's a fork() failure, reasonably well-known on Cygwin. The three main recommendations are:

  • stop antivirus software (probably not relevant on CI, also not something we can control on CI)
  • rebase all DLLs so they have distinct memory regions to load to so they don't collide while Cygwin tries to recreate the parent memory layout in the child (done here in this PR; I can try adding /bin to be absolutely sure all files have an address and that address is found
  • restart the computer and hope it goes away (WIndows makes fork even harder sometimes; probably not relevant on CI)

DWesl avatar Dec 05 '22 13:12 DWesl

It looks like all the rest are fork failures. There was only one before I tried to fix it, so let's see if undoing that fix gets things passing.

In case that doesn't work, is there some way to tell pytest that all instances of BlockingIOError: [Errno 11] Resource temporarily unavailable should be treated as a non-strict xfail of the relevant test on Cygwin?

DWesl avatar Dec 05 '22 17:12 DWesl

That logic should probably go around the subprocess calls in the helper used in the sphinx tests + subprocess_run_helper?

Might also be worth putting wrapped versions of the subproccess functions we use an matplotlib.testing and then making sure we never directly use the stdlib functions?

tacaswell avatar Dec 05 '22 17:12 tacaswell

The new CI run passes! Now let's see if it can do that consistently.

It looks like the remaining failures are mostly docs-related, then a compilation failure on Windows.

DWesl avatar Dec 06 '22 17:12 DWesl

Should I have moved the stdint include after the Python.h include rather than moving Python.h before stdint.h?

DWesl avatar Dec 07 '22 00:12 DWesl

Should I squash the first ninety or so commits? Most of them are changes to the debugging level, and one or two got merged in other PRs; it's mostly the last six or so that do other things.

DWesl avatar Dec 07 '22 22:12 DWesl

On one hand, it is currently at a very nice number of commits (100 🤯), but on the other I do not think we need the full history and pushing this to one commit for the c change, one commit for the change to the tests and one commit for the GHA config is probably the right move...

We can also squash-merge.

tacaswell avatar Dec 07 '22 22:12 tacaswell

I am inclined to squash-merge this as-is.

tacaswell avatar Dec 09 '22 20:12 tacaswell

Seems you aren't attached to the 100-commit bit, so here's a three-commit version.

DWesl avatar Dec 10 '22 14:12 DWesl

Looking at the skipped tests, several of the dependencies are available on Cygwin; I could try adding them and seeing what happens. The TeX tests are likely to run into fork failures, as are the ImageMagick animation tests and those requiring GhostScript, but the Tk tests should work fine.

DWesl avatar Dec 10 '22 15:12 DWesl

Note, we do prefer rebases over merges, if you can.

QuLogic avatar Dec 28 '22 23:12 QuLogic

It's not clear to me that this is worth the added CI load and complexity.

efiring avatar Dec 29 '22 21:12 efiring