cmake_example icon indicating copy to clipboard operation
cmake_example copied to clipboard

add Matlab GHA workflow (with a struct-definition addition to the basic example)

Open slayoo opened this issue 2 years ago • 31 comments

This PR adds a GHA workflow exemplifying usage of pybind11-built Python package from the Matlab Python interface. The GHA uses the https://github.com/matlab-actions/setup-matlab action to enable access to licensed Matlab.

It shows that:

  • on the one hand, the module is accessible and things like functions do work
  • on the other hand, instantiating a class fails with:
  Cannot find class 'py.pybind11_builtins.pybind11_type'.

This issue has been reported at:

  • https://github.com/pybind/pybind11/issues/3945
  • https://www.mathworks.com/matlabcentral/answers/2031219-matlab-fails-to-run-python-built-with-pybind

slayoo avatar Nov 19 '23 04:11 slayoo

I don't have the permissions to approve this workflow. @henryiii could you please help out? (Could you change the global setting to be like what we have for pybind11?)

rwgk avatar Nov 19 '23 05:11 rwgk

@rwgk, FWIW, here are GHA logs from a PR from the same branch to my fork's master: https://github.com/slayoo/cmake_example/actions/runs/6918406701/job/18820634080

slayoo avatar Nov 21 '23 10:11 slayoo

I don't have access to the settings for this repo, only @wjakob does.

henryiii avatar Nov 21 '23 12:11 henryiii

thank you for approving the CI run, the Matlab failure is now visible in this PR

slayoo avatar Nov 21 '23 17:11 slayoo

We need to find out where this error is produced:

Cannot find class 'py.pybind11_builtins.pybind11_type'.

I looked in the pybind11 sources for simply Cannot (22 matches), but none of those possibly has "find" as the next word.

Almost certainly, that error is produced outside pybind11. Where and why?

rwgk avatar Nov 21 '23 17:11 rwgk

More simple observations:

pybind11_builtins (case sensitive search) only appears in include/pybind11/detail/class.h, this line in 3 locations:

    setattr((PyObject *) type, "__module__", str("pybind11_builtins"));

But that module does not actually exist (not a surprise to me, but I never thought about the background). Simple experiment to show that:

test_buffers.py:10: in <module>
    import pybind11_builtins
E   ModuleNotFoundError: No module named 'pybind11_builtins'

Which brings me back to my previous comment: Where and why is the error message produced?

rwgk avatar Nov 21 '23 17:11 rwgk

@rwgk, I added you as a maintainer.

wjakob avatar Nov 22 '23 20:11 wjakob

@rwgk, I added you as a maintainer.

Thanks!

rwgk avatar Nov 22 '23 20:11 rwgk

We need to find out where this error is produced:

Cannot find class 'py.pybind11_builtins.pybind11_type'.

IIUC, this would be within Matlab code base? @mcafaro, @acampbel, @mw-hrastega, could you confirm and give us any hints here? (context: pybind11-generated Python bindings err when used using the Matlab Python interface by producing such error while instantiating classes, what otherwise works OK outside of Matlab; in this PR, we use the matlab-actions to depict the problem on CI). Thanks!

slayoo avatar Nov 22 '23 20:11 slayoo

FWIW, I just also looked in the CPython 3.8 sources for "Cannot find", which has 15 matches, but none of those could possibly have "class" as the next word.

IIUC, this would be within Matlab code base?

Yeah, that's what I'm thinking, too.

rwgk avatar Nov 22 '23 20:11 rwgk

Crazy idea: what happens if you add pybind11_builtins.py on your PYTHONPATH with pybind11_type = type?

I don't really expect that to work, but maybe it produces an error that gives us more clues?

rwgk avatar Nov 22 '23 21:11 rwgk

Crazy idea: what happens if you add pybind11_builtins.py on your PYTHONPATH with pybind11_type = type?

I don't really expect that to work, but maybe it produces an error that gives us more clues?

Here's an attempt to inject it through the GHA script: https://github.com/pybind/cmake_example/pull/164/commits/a4d99d88818842c284f7962eee3115f95269150c + 5bebb1f (waits for CI approval to run). Thanks!

slayoo avatar Nov 22 '23 21:11 slayoo

@rwgk WOW, it helped!!!

   Python module with properties:
  
       AStruct: [1x1 py.pybind11_builtins.pybind11_type]
      subtract: [1x1 py.builtin_function_or_method]
           add: [1x1 py.builtin_function_or_method]
  
      <module 'cmake_example' from '/home/runner/work/cmake_example/cmake_example/cmake_example.cpython-38-x86_64-linux-gnu.so'>
  
  
  AStruct = 
  
    Python pybind11_type with no properties.
  
      <class 'cmake_example.AStruct'>
  
  
  ans = 
  
    Python AStruct with no properties.
  
      <cmake_example.AStruct object at 0x7f94e5d46b70>
  

slayoo avatar Nov 22 '23 21:11 slayoo

WOW, it helped!!!

Amazing!

I'm wondering: Could/should we create a pybind11_builtins module and populate it with the actual pybind11_type?

But there is a big hitch: Potentially there is a mix of extensions built with different pybind11 versions and/or different compilers. That could become very confusing.

I think it'll be best if we get feedback from Matlab developers first.

rwgk avatar Nov 22 '23 22:11 rwgk

WOW, it helped!!!

Amazing!

I'm wondering: Could/should we create a pybind11_builtins module and populate it with the actual pybind11_type?

But there is a big hitch: Potentially there is a mix of extensions built with different pybind11 versions and/or different compilers. That could become very confusing.

I think it'll be best if we get feedback from Matlab developers first.

@rwgk, how should we best act here? The ingenious workaround is likely sought by many users, but how to best "package" it? Also, perhaps including the Matlab "action" in pybind11 CI will also help prevent future incompatibilities.

slayoo avatar Dec 08 '23 18:12 slayoo

IMO my workaround is a hack, just enough to prove that Matlab implicitly introduces a requirement that nobody else has (that the metaclass is importable). Why that requirement exists I don't know. If they give us a convincing reason, maybe we can adjust. But based on what I know at the moment, I'd say it'll be far better if Matlab is changed to not introduce that requirement. Maintaining a hack here will be a drag.

rwgk avatar Dec 08 '23 18:12 rwgk

I'm trying to instantiate a a pybind11 class (from a perfectly working python module) into Matlab and getting the same error. The However, I'm on Windows... What changes would I need to make, if any, to make this hack work? Python in Windows doesn't use PYTHONPATH.

StauntonXLIV avatar Dec 08 '23 19:12 StauntonXLIV

from a perfectly working python module

Assuming that perfectly working python module lives in a directory, simply put pybind11_builtins.py in there as well. If python finds your existing module there already, it should find pybind11_builtins.py there as well.

Another idea is this (tested only locally; I added that code more-or-less randomly to an existing test):

diff --git a/tests/test_async.cpp b/tests/test_async.cpp
index a5d72246..6e0c1616 100644
--- a/tests/test_async.cpp
+++ b/tests/test_async.cpp
@@ -22,4 +22,6 @@ TEST_SUBMODULE(async_module, m) {
             f.attr("set_result")(5);
             return f.attr("__await__")();
         });
+
+    m.def("module_new", [](const char *name) { return py::reinterpret_steal<py::object>(PyModule_New(name)); });
 }
diff --git a/tests/test_async.py b/tests/test_async.py
index 83a4c503..f3db4385 100644
--- a/tests/test_async.py
+++ b/tests/test_async.py
@@ -22,3 +22,11 @@ def test_await(event_loop):
 def test_await_missing(event_loop):
     with pytest.raises(TypeError):
         event_loop.run_until_complete(get_await_result(m.DoesNotSupportAsync()))
+
+
+def test_module_new():
+    virtual_pybind11_builtins = m.module_new("pybind11_builtins")
+    virtual_pybind11_builtins.pybind11_type = type
+    import sys
+    sys.modules["pybind11_builtins"] = virtual_pybind11_builtins
+    import pybind11_builtins

rwgk avatar Dec 09 '23 06:12 rwgk

from a perfectly working python module

Assuming that perfectly working python module lives in a directory, simply put pybind11_builtins.py in there as well. If python finds your existing module there already, it should find pybind11_builtins.py there as well.

Another idea is this (tested only locally; I added that code more-or-less randomly to an existing test):

Thank you so much! I couldn't get the first suggestion to work, maybe because that module uses a .pyd referencing a precompiled .dll...

What worked for me was to move the pybind11_builtins.py file into my venv Lib directory, and manually importing it into my Matlab pyenv. After that, importing my originally intended module worked perfectly.

StauntonXLIV avatar Dec 11 '23 22:12 StauntonXLIV

It crossed my mind, another option is to use py::metaclass((PyObject *) &PyType_Type), like here:

  • https://github.com/pybind/pybind11/blob/768cebe17e65c2a0a64ed067510729efc3c7ff6c/tests/test_methods_and_attributes.cpp#L370
    py::class_<YourType>(m, "YourType", py::metaclass((PyObject *) &PyType_Type))

The only potential downside is that static properties behave differently.

rwgk avatar Jan 31 '24 20:01 rwgk

@rwgk thanks! The above commit e1a73a4 is intended to try if it works. CI run requires maintainer approval.

slayoo avatar Feb 01 '24 05:02 slayoo

It seems to work. I created pybind/pybind11#5015 to make using this trick a little more obvious and legit-looking.

rwgk avatar Feb 01 '24 07:02 rwgk

Thanks @rwgk !

slayoo avatar Feb 01 '24 13:02 slayoo

@rwgk We're trying to include this "trick" in the PyPartMC project where the original issue was discovered. Compilation is fine, instantiation of the objects work, and Python tests pass. In the Matlab tests, instantiation goes fine (without the pybind11_builtins.py file), but trying to pass an object as an argument to a function fails with:

  Error using command_03eca20b_7d9b_4dfc_8ae3_5abbfa4e5128
  Python Error: TypeError: dist_sample(): incompatible function arguments. The
  following argument types are supported:
      1. (self: _PyPartMC.AeroState, AeroDist: AeroDist, sample_prop: float =
      1.0, create_time: float = 0.0, allow_doubling: bool = True, allow_halving:
      bool = True) -> int
  
  Invoked with: <_PyPartMC.AeroDist object at 0x7f6887adaab0>

So, the dist_sample() function expects an AeroDist object, but when it gets it is reported to be of an incompatible type. The same logic works when executed from Python without the Matlab bridge layer.

Any hints? Thanks

slayoo avatar Feb 01 '24 16:02 slayoo

The same logic works when executed from Python without the Matlab bridge layer.

Any hints? Thanks

Did it work before, with the default (custom) pybind11 metaclass and the alternative pybind11_type trick?

I'm interested in finding out why you see the failure. Could it be reproduced here, in your matlab.yml job?

rwgk avatar Feb 01 '24 16:02 rwgk

Yes, it did work with the pybind11_type trick. I'll try to reproduce it here...

slayoo avatar Feb 01 '24 16:02 slayoo

The matlab job is still passing here.

Totally guessing: do you have multiple extension modules in your original failure case (Python Error: TypeError: dist_sample(): incompatible function arguments.)?

rwgk avatar Feb 01 '24 16:02 rwgk

Generally, if guessing a reproducer doesn't work straightaway, I give up very quickly and turn to reducing (or debugging) the original problem instead.

rwgk avatar Feb 01 '24 16:02 rwgk

debugging is tricky as I do not have access to Matlab beyond the Github Actions CI :) (the above commit checks if the use of py::arg could be the reason)

slayoo avatar Feb 01 '24 16:02 slayoo

I was thinking of adding print statements in the pybind11 sources to see where/why it's rejecting the argument. We wouldn't have to get into matlab. — I can help inserting prints; but we need a failure to start with.

rwgk avatar Feb 01 '24 16:02 rwgk