pyjnius icon indicating copy to clipboard operation
pyjnius copied to clipboard

PyPy is segfaulting in CI - how can I help?

Open mattip opened this issue 2 years ago • 9 comments

Hi. PyPy dev here, new to the project but curious about the segfault in CI. What would be the best way to get to the root cause? Pair programming? Read some documentation and get a dev environment set up? What would be the easiest way to get a minimal cython reproducer without Java?

mattip avatar Jul 04 '22 18:07 mattip

Hi @mattip, nice to see you here !

I guess you're referring to https://github.com/conda-forge/pyjnius-feedstock/pull/35

As you noticed, some fixes were introduced in https://github.com/kivy/pyjnius/pull/627, and everything seemed great. (All the tests passed on the PR, and the same happened for the following CI runs)

Unfortunately (as for https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=530828&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1) seems that we now have a new segfault (seems that happens in a different test phase from the previous one).

To speed-up the process, I'm available to chat on #dev channel @ Our Discord Chat, and I'm sure that other Core Devs (and contributors) are also happy to help.

If I'm right, there's a chance that I was able to reproduce a segfault on the same test with a specific setup on macOS 12 + Apple Silicon in a Rosetta Terminal (that then disappeared) 🧐 .

misl6 avatar Jul 04 '22 19:07 misl6

I was refering to the the segfault in this repo's CI here where both PyPy CI runs segfault, the message does not help me understand what is going on:

home/runner/work/_temp/20709d7f-80bf-4abe-8495-2fd0235fb524.sh: line 2:  \
    1840 Aborted          \
    (core dumped) CLASSPATH=../build/test-classes:../build/classes python -m pytest -v

mattip avatar Jul 04 '22 20:07 mattip

NB: As jnius loads the JVM, its usually the case that Java's signal fault handler takes priority and generates a hs_err_pid log file which can contain a more meaningful stacktrace, even for native/Cython.

cmacdonald avatar Jul 04 '22 21:07 cmacdonald

Meanwhile, I was trying to reproduce the segfault on the above-mentioned config.

  • pytest-rerunfailures seems to be hiding the issue (at least partially)

Local test configuration:

  • macOS 12 (on Apple Silicon, PyPy runs on Rosetta, but I guess is the same on an Intel mac)
  • PyPy v7.3.9-osx64
  • JDK 17.0.3 (x86_64) "Eclipse Adoptium"

Manually running the failing test (tests/test_lambdas.py) reports:

Fatal RPython error: a thread is trying to wait for the GIL, but the GIL was not initialized
(For PyPy, see https://foss.heptapod.net/pypy/pypy/-/issues/2274)
zsh: abort      ..../pypy3.9-v7.3.9-osx64/bin/pypy tests/test_lambdas.py

misl6 avatar Jul 04 '22 21:07 misl6

tests/test_lambdas.py came from me. Can you narrow down to a particular test method?

cmacdonald avatar Jul 04 '22 21:07 cmacdonald

tests/test_lambdas.py came from me. Can you narrow down to a particular test method?

Looks that is failing here:

https://github.com/kivy/pyjnius/blob/0421f01452d2d6b8ce9a3a632d7f7a39d27f00fb/tests/test_lambdas.py#L10

misl6 avatar Jul 04 '22 21:07 misl6

So the java thread pool are calling back into a python class which implements Callable which calls the Python lambda.

One big(!) hack that is there is to ensure that the /Callable/ object, rather than the lambda itself (IIRC) is not GCd. https://github.com/kivy/pyjnius/blob/bcf4e28e170c02abab38dc1194b2a33d3adb13d1/jnius/jnius_conversion.pxi#L122

If it has been GCd by Python, segfaults can occur. Could it have been GCd by Pypy?

cmacdonald avatar Jul 04 '22 21:07 cmacdonald

Is there use of forking plus threads? We have seen some hairy bugs with this, the state is shared in strange and wondrous ways.

mattip avatar Jul 04 '22 21:07 mattip

If it has been GCd by Python, segfaults can occur. Could it have been GCd by Pypy?

Typically, the PyPy GC is less aggressive than the CPython one: objects tend to stay around a little longer. I wonder if changing the order to set up the thread pool before creating the function will change anything:

-     callFn = lambda: "done"
     executor = autoclass("java.util.concurrent.Executors").newFixedThreadPool(1)
+     callFn = lambda: "done"
     future = executor.submit(callFn)

mattip avatar Jul 04 '22 21:07 mattip