pdoc icon indicating copy to clipboard operation
pdoc copied to clipboard

Submodules not showing up for (native) extension modules

Open robamler opened this issue 3 years ago • 6 comments

When running pdoc on an extension modules (aka, a native "C" extensions), the extension module's submodules don't show up in the documentation even though <TAB>-autocomplete in a Python REPL can find the submodules. This seems to be because pdoc searches for submodules by inspecting the source directory, which isn't available for extension modules.

I've proposed PR #318 to fix this issue. The proposed solution works but I'm not sure if it is safe enough to remove the old "source directory traversal" method. I'd appreciate guidance on completing the PR.

Expected Behavior

Running pdoc on a native extension module should generate documentation for the entire extension module, including its submodules.

Actual Behavior

  • With current master: only items in the top-level module appear in the documentation. Submodules don't show up in the documentation.
  • With the proposed fix in #318: works as expected.

Steps to Reproduce

The following steps generate a minimalistic native extension module in Rust that exhibits the problem. The language shouldn't matter though.

  1. Install a rust toolchain, see https://rustup.rs
  2. Create the following directory structure:
pyext/
├── Cargo.toml
└── src/
    └── lib.rs

with the following file contents:

  • Cargo.toml:
[package]
authors = ["Name <[email protected]>"]
edition = "2018"
name = "pyext"
version = "0.1.0"

[lib]
crate-type = ["cdylib"]

[dependencies]
pyo3 = {version = "0.13.2", features = ["extension-module"]}
  • src/lib.rs:
use pyo3::{prelude::*, wrap_pymodule};

/// Docstring of main module.
#[pymodule(pyext)]
fn init_main_module(_py: Python<'_>, module: &PyModule) -> PyResult<()> {
    module.add_wrapped(wrap_pymodule!(submodule))?;
    Ok(())
}

/// Docstring of submodule
#[pymodule(submodule)]
fn init_submodule(_py: Python<'_>, submodule: &PyModule) -> PyResult<()> {
    submodule.add("variable", 42)?;
    Ok(())
}
  1. Compile the extension module: cargo build
  2. Create a properly named symlink to the object file:
    • on Linux: ln -s target/debug/libpyext.so pyext.so
    • on Mac: ln -s target/debug/libpyext.dylib pyext.so
    • on Windows: rename target\debug\libpyext.dll to pyext.pyd
  3. Start a Python REPL from the directory containing the pyext.so file and verify that the submodule exists and can be found by tab completion:
$ python
Python 3.6.10 (default, May 22 2020, 17:59:48) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyext
>>> pyext.<TAB>  --> autocompletes to "pyext.submodule", proving that the submodule can be found
>>> pyext.submodule.variable
42
  1. Run pdoc --html pyext from the same directory.
    • With the version at current master, the generated documentation leaves out the submodule.
    • With the version proposed in #318, the generated documentation includes the submodule.

Additional info

  • pdoc version: master and 0.9.2 don't work, the one from #318 works.
  • tested on linux

robamler avatar Mar 02 '21 00:03 robamler

Thanks for an exemplary bug report!

Just to clarify: Step 5, when we import pyext, could we just as well have done:

>>> import pyext.submodule

# or

>>> from pyext.submodule import variable

Does this run?

kernc avatar Mar 05 '21 13:03 kernc

Just tested it:

  • import pyext.submodule doesn't work;
  • from pyext.submodule import variable doesn't work;
  • however, from pyext import submodule works.
$ python
Python 3.8.2 (default, Mar  2 2021, 23:57:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyext.submodule
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext.submodule import variable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext import submodule
>>> submodule.variable
42

I think this is because the extension module pyext is compiled into a single binary file that cannot be loaded "in parts" (unlike regular modules, whose implementation is typically scattered across several source files). The python interpreter doesn't know about the submodules until it actively loads the pyext module, which (I think) it only does when you explicitly say either import pyext or from pyext import xxx.

In other words, I think from A import B actually loads A (but only brings A.B into scope, as B), so from pyext.submodule import variable would try to load pyext.submodule, which doesn't exist in the file system because it only gets generated "in memory" when you load pyext.

robamler avatar Mar 07 '21 19:03 robamler

That's exactly why I asked because I remembered resolving to wontfix about a similar issue just recently. See my thoughts in https://github.com/pdoc3/pdoc/pull/252#issuecomment-698361252. The simple fact is:

ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package

pyext.submodule is not a module to have stuff imported from, so I'm hesitant to make pdoc list it as such.

Can you investigate if you can set .__package__ and .__path__ attributes (or whatever is necessary to interpret Python module as a package) upon the relevant package/module objects and if maybe that automatically does something?

kernc avatar Mar 07 '21 20:03 kernc

Thank you for the explanation! Unfortunately, setting .__package__ and .__path__ in the extension module doesn't help.

I respect your decision if you don't want to address this. I'd just like to raise two counter arguments for your consideration. First, I think this issue will probably affect a lot of people (probably all authors of native extension modules that don't find some sort of workaround). Second, I am interpreting the ModuleNotFoundError in a different way. In fact, I get the same error message when I try to import, e.g., pdoc:

>>> import pdoc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdoc'

The reason is that I'm in a python environment where pdoc isn't installed, and that's why the python interpreter can't find it and thus throws a ModuleNotFoundError. So, even though pdoc definitely is a module (it's even a package), it just can't be found at the moment. But as soon as you bring it in your sys.path, it can be found. I'd argue that the situation for pyext.submodule is quite similar: it is a module, it just can't be found at the moment. But as soon as you import pyext (which is the package on which I want to run pdoc anyway), then pyext.submodule can be found (and is recognized as a module):

>>> import pyext
>>> type(pyext.submodule)
<class 'module'>

I agree that pyext.submodule is not a package (e.g., it doesn't have a .__path__ set), but I think that shouldn't make a difference.

robamler avatar Mar 07 '21 21:03 robamler

it just can't be found at the moment

That's correct. That's why Python has sys.path_hooks (to maybe provide a suitable finder/loader for a given package) and sys.meta_path, which is a list of already registered default finders.

Following related upstream issues:

  • https://github.com/PyO3/pyo3/issues/759,
  • https://github.com/PyO3/maturin/issues/266,

I think PyO3 might wish to provide a finder akin to the one removed in https://github.com/PyO3/pyo3/pull/1269/commits/8d14568f7d3077924a23e3f15392d13180cbc828 (briefly discussed in https://github.com/PyO3/pyo3/pull/1269#discussion_r520807688), and add it to sys.meta_path upon loading the top-level extension module. This way, both pdoc pyext as well as >>> import pyext.submodule would work flawlessly, and it'll be justified to call pyext.submodule a package and its submodule (instead of merely a variable pointing to a module object such as with >>> import re as my_re).

I'd just hate to have pdoc's deviate from the Python's interpretation of stuff.

Then again, we do check in https://github.com/pdoc3/pdoc/pull/318 that the object is present in __all__, so the intent is visible, and there's little utility in documenting modules containing further objects as mere variables. The end-user will be confused that they can't:

from your_pyext.submodule.nested import Something

But that's not really our problem ... :thinking:

kernc avatar Mar 08 '21 02:03 kernc

There's apparently a workaround described in https://github.com/PyO3/pyo3/issues/1517#issuecomment-808664021.

kernc avatar Mar 28 '21 14:03 kernc