pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

WIP Python 3.11 related

Open rwgk opened this issue 3 years ago • 9 comments

Description

WIP / experiments

Suggested changelog entry:


rwgk avatar Aug 05 '22 15:08 rwgk

@vstinner I saw you worked on safe_path. I'm wondering if you have clues that could help me understand a difference in behavior between Python 3.10.5 and 3.11.0b5:

  • 3.10.5: pybind11 test_embed updates sys.path from PYTHONPATH
  • 3.11.0b5: PYTHONPATH is ignored in the exact same situation***

Does that ring any bells? Some details are below, JIC.

@henryiii I wonder if this could have to do with PR #3923, which introduced a new initialize_interpreter() implementation specifically for 3.11:

https://github.com/pybind/pybind11/pull/3923/files#diff-80008385afb03df2a6e6d29a0b500fbaf9593d1baba5a5bda112998ecdfdb131


*** The "exact same situation" is possibly unusual (interactively building Python from sources with configure; make install; using my own scons-based tools to build the pybind11 tests), but the only variable is the Python version.

I added fprintf to pybind11 test_embed sources: 198c9c21372047cf43856d4d01f0cc56d2f9a11f

And patched the Python sources like this:

--- Objects/unicodeobject.c.orig        2022-08-05 08:18:54.186431792 -0700
+++ Objects/unicodeobject.c     2022-08-04 17:36:36.375312426 -0700
@@ -16005,13 +16005,14 @@
 init_fs_encoding(PyThreadState *tstate)
 {
     PyInterpreterState *interp = tstate->interp;
+    _Py_DumpPathConfig(tstate);

     /* Update the filesystem encoding to the normalized Python codec name.
        For example, replace "ANSI_X3.4-1968" (locale encoding) with "ascii"
        (Python codec name). */
     PyConfig *config = (PyConfig*)_PyInterpreterState_GetConfig(interp);
     if (config_get_codec_name(&config->filesystem_encoding) < 0) {
-        _Py_DumpPathConfig(tstate);
+        //_Py_DumpPathConfig(tstate);
         return _PyStatus_ERR("failed to get the Python codec "
                              "of the filesystem encoding");
     }

I'm seeing this with 3.10.5:

Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
  program name = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  sys._base_executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  sys.base_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.base_exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.platlibdir = 'lib'
  sys.executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  sys.prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.path = [
    '/usr/local/google/home/rwgk/forked/build_py3110b5/lib',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python310.zip',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python3.10',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python3.10/lib-dynload',
  ]
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed":

LOOOK PYTHONPATH /usr/local/google/home/rwgk/forked/build_py3105/lib:/foo/bar

LOOOK sys.path ['', '/usr/local/google/home/rwgk/forked/build_py3105/lib', '/foo/bar', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.10/dist-packages']
===============================================================================
All tests passed (1553 assertions in 11 test cases)

And this with 3.11.0b5:

Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
  program name = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11'
  sys._base_executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  sys.base_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.base_exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.platlibdir = 'lib'
  sys.executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  sys.prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.path = [
    '/usr/local/google/home/rwgk/forked/build_py3110b5/lib',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python311.zip',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/lib-dynload',
  ]
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed":

LOOOK PYTHONPATH /usr/local/google/home/rwgk/forked/build_py3110b5/lib:/foo/bar

LOOOK sys.path ['', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python311.zip', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/lib-dynload', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/site-packages']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_embed is a Catch v2.12.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
Restart the interpreter
-------------------------------------------------------------------------------
/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed/test_interpreter.cpp:174
...............................................................................

/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed/test_interpreter.cpp:192: FAILED:
  REQUIRE( py::module_::import("external_module").attr("A")(123).attr("value").cast<int>() == 123 )
due to unexpected exception with message:
  ModuleNotFoundError: No module named 'external_module'

===============================================================================
test cases:   11 |   10 passed | 1 failed
assertions: 1537 | 1536 passed | 1 failed

rwgk avatar Aug 05 '22 17:08 rwgk

pybind11 seems to use a different API on Python 3.10 and Python 3.11.

PySys_SetArgvEx() called after Python init on Python 3.10 versus Py_InitializeFromConfig() on Python 3.11.

Your dump says:

  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
...
  environment = 1

So PYTHONPATH was used. It's part of the computed sys.path.

pybind11 doesn't call PySys_SetArgvEx() on Python 3.11. That's the function which changes sys.path.

  • https://docs.python.org/dev/c-api/init.html#c.PySys_SetArgvEx
  • https://docs.python.org/dev/c-api/init_config.html#init-config
  • https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.safe_path

vstinner avatar Aug 05 '22 20:08 vstinner

@vstinner Thanks a lot for the explanation and the pointers!

I see PySys_SetArgvEx() is deprecated in 3.11.

So PYTHONPATH was used. It's part of the computed sys.path.

But evidently the computed sys.path is not in effect in test_embed. — What's the recommended way to restore the established behavior?

I decided to try something straightforward, using PyRun_SimpleString(), similar to what we do already for add_program_dir_to_path (I believe @henryiii did this based on your recommendation here):

--- a/include/pybind11/embed.h
+++ b/include/pybind11/embed.h
@@ -166,6 +166,9 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
         throw std::runtime_error(PyStatus_IsError(status) ? status.err_msg
                                                           : "Failed to init CPython");
     }
+    // Emulates established behavior (prior to Python 3.11 implemented by PySys_SetArgvEx()).
+    PyRun_SimpleString("import sys, os; "
+                       "sys.path[:0] = os.environ.get('PYTHONPATH', []).split(os.pathsep)");
     if (add_program_dir_to_path) {
         PyRun_SimpleString("import sys, os.path; "
                            "sys.path.insert(0, "

With that I get back the established behavior and test_embed passes in my situation. Is that approach OK?

rwgk avatar Aug 05 '22 22:08 rwgk

But evidently the computed sys.path is not in effect in test_embed.

Sorry, I don't get how you manage to get PYTHONPATH environment variable being ignored. If you are looking for my help, it would help if you could write a short C or C++ program reproducing your issue. pybind11 is a very large code base that I don't know.

vstinner avatar Aug 09 '22 10:08 vstinner

There seems to be some kind of conflict in expectations & observations:

Sorry, I don't get how you manage to get PYTHONPATH environment variable being ignored.

But earlier:

Your dump says:

  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
...
  environment = 1

So PYTHONPATH was used. It's part of the computed sys.path.

=========================================================================

it would help if you could write a short C or C++ program reproducing your issue.

At the moment I'm leaning more towards root-causing why environment = 1 but sys.path is not updated.

@vstinner Would you have a pointer to the cpython sources that update sys.path? (I looked around previously but didn't get lucky. I'll try again.)

rwgk avatar Aug 09 '22 14:08 rwgk

At the moment I'm leaning more towards root-causing why environment = 1 but sys.path is not updated.

Interestingly — and luckily — I'm seeing a behavior change between 3.11.0b5 and 3.11.0rc1, both installed from tar files.

Everything except 3.11.0b5 vs 3.11.0rc1 being equal, when running test_embed with

  • 3.11.0b5: The Python path configuration: output appears 1 time.
  • 3.11.0rc1: The Python path configuration: output appears 13 times.

I believe it is enlightening, because only the 1st Python path configuration: has environment = 1, but the other 12 have environment = 0. As it happens, the failing unit test is after the 5th and before the 6th Python path configuration: output.

This observation got me to look again at the new 3.11-specific initialize_interpreter() implementation. We are using PyConfig_InitIsolatedConfig() which is actually documented to ignore environment variables:

  • https://docs.python.org/3/c-api/init_config.html#isolated-configuration

I tried adding config.use_environment = 1; instead of my previous PyRun_SimpleString() diff, but that didn't help.

What do we want to do?

A. Document the pybind11 behavior change? (What do we want to suggest as a solution for people who relied on the old behavior?) B. Switch to PyConfig_InitPythonConfig() instead? (What are the pros and cons?) C. Add a bool isolated = false; to select between PyConfig_InitPythonConfig() or PyConfig_InitIsolatedConfig()? D. Keep my PyRun_SimpleString() diff? Maybe behind a bool add_environ_pythonpath_to_path = true option?

@henryiii @Skylion007 @vstinner what is your vote or recommendation?

I don't like A but don't have enough background to decide between B, C, D. D doesn't look very general. Maybe B or C?

rwgk avatar Aug 10 '22 00:08 rwgk

PyConfig_InitIsolatedConfig() sets isolated to 1 and isolated=1 implies use_environment=0.

  • https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.isolated
  • https://docs.python.org/dev/c-api/init_config.html#init-isolated-conf

vstinner avatar Aug 11 '22 00:08 vstinner

B. Switch to PyConfig_InitPythonConfig() instead? (What are the pros and cons?)

That's closer to the regular "python" program. You can then tweak configuration from these defaults. For example, it uses isolated=0 and use_environment=1. Example:

  • https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.use_environment : "Default: 1 in Python config and 0 in isolated config."

vstinner avatar Aug 11 '22 00:08 vstinner

@vstinner Would you have a pointer to the cpython sources that update sys.path? (I looked around previously but didn't get lucky. I'll try again.)

If you are talking about PySys_SetArgvEx() that you only call on Python 3.10 and older:

  • https://github.com/python/cpython/blob/73d8ffefe95791fa1f036b029cf51f907a89ee42/Python/sysmodule.c#L3322-L3363
  • https://github.com/python/cpython/blob/73d8ffefe95791fa1f036b029cf51f907a89ee42/Python/pathconfig.c#L376-L513

vstinner avatar Aug 11 '22 00:08 vstinner

I only now got a chance to poke some more at this, trying to use PyConfig_InitPythonConfig() instead:

-    PyConfig_InitIsolatedConfig(&config);
+    PyConfig_InitPythonConfig(&config);

With that as the only diff against current master, the PYTHONPATH-related failure is resolved, but it is failing here instead:

TEST_CASE("sys.argv gets initialized properly") {
    py::finalize_interpreter();
    {
        py::scoped_interpreter default_scope;
        auto module = py::module::import("test_interpreter");
        auto py_widget = module.attr("DerivedWidget")("The question");
        const auto &cpp_widget = py_widget.cast<const Widget &>();
        REQUIRE(cpp_widget.argv0().empty());
    }

    std::string cpp_widget_argv0;
    {
        char *argv[] = {strdup("a.out")};
        py::scoped_interpreter argv_scope(true, 1, argv);
        std::free(argv[0]);
        auto module = py::module::import("test_interpreter");
        auto py_widget = module.attr("DerivedWidget")("The question");
        const auto &cpp_widget = py_widget.cast<const Widget &>();
        cpp_widget_argv0 = cpp_widget.argv0();
    }
    REQUIRE(cpp_widget_argv0 == "a.out"); // FAILURE
    py::initialize_interpreter();
}

Note that I moved the REQUIRE out, otherwise it segfaults.

With all other tested removed, this simply fails. With others tests present, it turns into a segfault (possibly triggered by the following test) even with the REQUIRE moved out.

Commenting out the std::free(argv[0]); call didn't help.

I have to give up for now. We don't use embedding, I cannot spend more time on this. For my local testing I can just use my PyRun_SimpleString() workaround.

If nobody else gets to this before the next pybind11 release, we should at least document that PYTHONPATH is not used anymore with Python 3.11 and put a "help wanted" tag on it.

rwgk avatar Aug 15 '22 22:08 rwgk

Python still uses PYTHONPATH env var. I cannot explain why pybind11 no longer uses it since you didn't provide a short example reproducing your issue. Python initialization is very complicated, they are like 50+ options, configuration files, etc.

vstinner avatar Aug 16 '22 11:08 vstinner

Python still uses PYTHONPATH env var. I cannot explain why pybind11 no longer uses it since you didn't provide a short example reproducing your issue.

It's purely a pybind11 issue, because we're using PyConfig_InitIsolatedConfig() for Python >= 3.11. I think it just escaped our attention and standard testing that this introduced a pybind11 behavior change.

Python initialization is very complicated, they are like 50+ options, configuration files, etc.

That's probably why I got stuck when trying PyConfig_InitPythonConfig().

To summarize:

  • With PyConfig_InitIsolatedConfig() PYTHONPATH is not used as documented (i.e. the config is working as intended).
  • With PyConfig_InitPythonConfig() the unit test that depends on PYTHONPATH — in my environment (!) — is working, but a different unit test is failing. I don't know why.

rwgk avatar Aug 18 '22 00:08 rwgk

With PyConfig_InitIsolatedConfig() PYTHONPATH is not used as documented (i.e. the config is working as intended).

You should be able to use PYTHONPATH if you set isolated=0 and use_environment=1.

vstinner avatar Aug 18 '22 12:08 vstinner

You should be able to use PYTHONPATH if you set isolated=0 and use_environment=1.

Confirmed, thanks @vstinner! (Previously I only tried use_environment=1 without also setting isolated=0.)

With that figured out, this PR boils down to just adding two lines in embed.h (I just need to remove the debug printf code).

But I also want to add a unit test that fails if PYTHONPATH is not used. I haven't worked on the embedding tests before, therefore that's a little adventure for me.

rwgk avatar Aug 18 '22 18:08 rwgk

Confirmed, thanks @vstinner!

Nice! Maybe the effect of isolated=1 is not well explained in the doc.

  • https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.isolated
  • https://docs.python.org/dev/c-api/init_config.html#init-isolated-conf

vstinner avatar Aug 19 '22 11:08 vstinner

@Skylion007 @henryiii

This (small) PR is ready for review now.

rwgk avatar Aug 20 '22 04:08 rwgk

I like the final change, short and simple :-D

vstinner avatar Aug 20 '22 13:08 vstinner

Congrats for fixing your issue!

vstinner avatar Aug 24 '22 22:08 vstinner