lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Python 3.11 support

Open edwardalee opened this issue 1 year ago • 19 comments

This is an attempt to get the python target to work with Python 11. It gets past the segfault, but when it imports the generated C module, Python for some reason fails to recognize the result as an extension module. For example, the ActionDelay.lf test gives this error message:

SystemError: initialization of LinguaFrancaActionDelay did not return an extension module

edwardalee avatar Jul 13 '23 11:07 edwardalee

Shouldn't the CI configuration in this branch also be changed to work with Python 11?

lhstrh avatar Jul 13 '23 19:07 lhstrh

Actually, I would like to make sure the branch works with Python 3.10. It does not yet work with Python 3.11.

edwardalee avatar Jul 13 '23 20:07 edwardalee

Actually, I would like to make sure the branch works with Python 3.10. It does not yet work with Python 3.11.

I updated the CI config to that effect in 3bc9562...

lhstrh avatar Jul 16 '23 22:07 lhstrh

From what I understood from @cmnrd, this PR provides fixes for a problem that shouldn't exist in the first place (or, rather, should be solved in a different way). He implied that we are not incompatible with Python 11 as is, and that the segmentation fault occurs due to building using with one version of Python and then running with another.

lhstrh avatar Jul 19 '23 23:07 lhstrh

I'm referring to https://github.com/lf-lang/lingua-franca/issues/1458#issuecomment-1640218120.

lhstrh avatar Jul 19 '23 23:07 lhstrh

I may be wrong about this, but let me defer to @cmnrd for the review of this PR.

lhstrh avatar Jul 19 '23 23:07 lhstrh

From what I understood from @cmnrd, this PR provides fixes for a problem that shouldn't exist in the first place (or, rather, should be solved in a different way). He implied that we are not incompatible with Python 11 as is, and that the segmentation fault occurs due to building using with one version of Python and then running with another.

I think there is some confusion about the python errors, and I am not sure what this PR fixes. On my Linux machine with Python 3.11 things seem to run fine without any errors even on master. However, given that earlier CI runs in this PR actually failed, it looks like there are other error than the segmentation fault caused by incompatible Python runtimes.

cmnrd avatar Jul 20 '23 08:07 cmnrd

I updated this branch today (merged in master, also updated the submodule and merged in reactor-c main) and extended CI so that we test both Python 3.10 and 3.11. However, now the tests fail again and I don't know what is going on. Must be some change that happened in the meantime in LF or reactor-c.

cmnrd avatar Jul 26 '23 12:07 cmnrd

@jackykwok2024, could you please take a look at this?

lhstrh avatar Aug 30 '23 00:08 lhstrh

I have consistently reproduced the error using the BanksCount3ModesSimple.lf file on my Linux machine. The code generated runs as expected with Python 3.10. However, it encounters a segmentation fault when executed using Python 3.11, as outlined below:

DEBUG: Invoking reaction function _countercycle.reaction_function_2
ERROR: FATAL: Failed to invoke reaction _countercycle.reaction_function_2
Traceback (most recent call last):
  File "/mnt/c/Users/jacky/Desktop/LF/python311/src-gen/BanksCount3ModesSimple/BanksCount3ModesSimple.py", line 74, in reaction_function_2
    count.set(3)
TypeError: Could not set objects.
Segmentation fault

Upon comparing the output logs between Python 3.10 and Python 3.11, I noticed a significant discrepancy in the reference count for the val within the py_port_set function.

image

Output from Python 3.11: DEBUG: Setting value 0xaa5f28 with reference count 1000000161 Output from Python 3.10: DEBUG: Setting value 0x7f53793c8130 with reference count 98

As a result, my conjecture is that this issue arises from a deep recursion in Count3Modes.lf.

reactor CounterCycle {
  input next
  output count

  initial mode One {
    reaction(next) -> count, reset(Two) {=
      count.set(1)
      Two.set()
    =}
  }

  mode Two {
    reaction(next) -> count, reset(Three) {=
      count.set(2)
      Three.set()
    =}
  }

  mode Three {
    reaction(next) -> count, reset(One) {=
      count.set(3)
      One.set()
    =}
  }
} 

According to the official documentation of Python 3.11, there is an optimization update for recursive functions and the C stack in Python to Python calls is removed (https://github.com/python/cpython/issues/89419) However, I'm uncertain whether this is the underlying issue.

image

Additionally, here is the GDB backtrace: image

jackyk02 avatar Sep 01 '23 04:09 jackyk02

Thanks, @jackykwok2024, that's a really good find! I agree that it seems plausible that this optimization is to blame... Seems hard to fix if true. Do you have any ideas?

lhstrh avatar Sep 01 '23 05:09 lhstrh

I don't see any recursion in Count3Modes.lf, so it's not clear how this optimization could be related. ???

edwardalee avatar Sep 01 '23 08:09 edwardalee

I figured there may be recursion involved with the mode transitions or set implementation, but I could well be wrong...

lhstrh avatar Sep 01 '23 15:09 lhstrh

Sorry for the confusion in my previous comments. My initial conjecture was that the inflated reference count might have been due to recursion involved with the set implementation. However, after a more detailed review of the changes for Python 3.11, as indicated in the PRs, the issue seems to be different.

In Python 3.10 and earlier, the function sys.getrefcount() returns the reference count of an object, indicating the number of variables or data structures that reference it. However, Python 3.11 introduces a modification for statically allocated interpreter states, and certain objects like small integers are now marked as "immortal."

These immortal objects have an artificially inflated reference count, for example, 1000000161 instead of a typical smaller number. This is what we are observing as the reference count in the py_port_set function.

This issue seems to be unrelated to the segfault but could lead to confusion for developers during the debugging process…

jackyk02 avatar Sep 04 '23 23:09 jackyk02

A minor issue is that with python 3.11, we get a deprecation warning that should be easy to fix:

pythontarget.c:600:9: warning: 'Py_SetPath' is deprecated [-Wdeprecated-declarations]
        Py_SetPath(wcwd);

This will require a fix in reactor-c.

edwardalee avatar Oct 05 '23 11:10 edwardalee

Have we made progress in figuring out why the tests do not pass?

I noticed that we're increasing the testing burden by testing both Python 3.10 and 3.11. What should be the policy going forward? What range of Python version do we plan to support?

It is up to us to decide which Python versions to support. With Mocasin we aim to support the 3 latest versions. I think we should aim to support at least the latest version, as well as the version used by the latest LTS Ubuntu (and Debian) releases (which is 3.10 at the moment).

cmnrd avatar Oct 25 '23 08:10 cmnrd

Since Python 3.12 was released recently, we should actually also add it to the matrix.

cmnrd avatar Oct 25 '23 09:10 cmnrd

I noticed that this PR has stalled. Is there any ambition to resurrect this effort?

lhstrh avatar Jun 05 '24 07:06 lhstrh

I have no bandwidth to resurrect this, but we have a bigger problem, which is that CMake should not be choosing the Python version. We need a target property or some such. See #2298.

edwardalee avatar Jun 05 '24 08:06 edwardalee