pdoc icon indicating copy to clipboard operation
pdoc copied to clipboard

Add scanfilter to Module constructor

Open poke1024 opened this issue 4 years ago • 5 comments

One way to address https://github.com/pdoc3/pdoc/issues/99: adds a new scanfilter parameter that allows excluding specific submodules/sub trees.

This also moves _iter_modules out of the method scope, which makes it monkey-patchable for extreme cases, where one would want to override the whole package scan logic (which I need to do right now).

poke1024 avatar Dec 23 '19 16:12 poke1024

extreme cases where one would want to override the whole package scan logic (which I need to do right now)

Can you elaborate? The industry rightfully doesn't favor monkey-patching private members ...

kernc avatar Dec 23 '19 22:12 kernc

Since you're familiar with the issue, what is your opinion of the proposed override using __pdoc__ dict:

__pdoc__ = {"some.package.module": False}  # Skips some.package.module and descendants

kernc avatar Dec 23 '19 22:12 kernc

extreme cases where one would want to override the whole package scan logic (which I need to do right now)

Can you elaborate? The industry rightfully doesn't favor monkey-patching private members ...

Yes, indeed. I have a very singular use case - not at all common - where I basically build a virtual tree of packages that does not exist on the disk, i.e. I want to override the whole package tree scanning. Now, the clean way to achieve this would be to add some kind of ModuleIterator system that is passed into Module, and I started adding that. But it gets complex, and I'm probably the only person who'd ever use that. So, for the sake of simplicity, being able to monkey patch for this one very special singular use case is an advantage over forking the whole module.

poke1024 avatar Dec 24 '19 07:12 poke1024

Since you're familiar with the issue, what is your opinion of the proposed override using __pdoc__ dict:

__pdoc__ = {"some.package.module": False}  # Skips some.package.module and descendants

As far as I understand it, __pdoc__ seems to be a way to exclude subpackages from within a package I control myself.

My use case is building docs for third-party packages (e.g. bokeh, scikit-learn) that I don't control - I need to externally exclude subpackages.

Some of these subpackages include test packages and experimental stuff with esoteric dependencies, or won't run at all due to legacy problems or hard exceptions (stuff like "this package is obsolete, don't use it", which happens with plotly). Currently pdoc will stop on the importing on these subpackages. So I need to intercept them before they even get parsed or imported.

Now, my whole use case of using pdoc on libs that already have docs might of course seem nutty. But pdoc is the best tool out there to try to build unified documentation for a set of libraries.

poke1024 avatar Dec 24 '19 07:12 poke1024

__pdoc__ seems to be a way to exclude subpackages from within a package I control myself.

Not necessarily. As the docs state:

The keys should be string identifiers within the scope of the module or, alternatively, fully-qualified reference names. [...] then key (and its members) will be excluded from the documentation.

So IIUC, if you set:

__pdoc__ = {'sklearn': False}

the whole of scikit-learn should become disabled. This doesn't prevent the loud importation of odd or "broken" modules, though.


So, for the sake of simplicity, being able to monkey patch for this one very special singular use case is an advantage over forking the whole module.

If you have scanfilter= already, of what use is monkey patching _iter_modules()?


basically build a virtual tree of packages that does not exist on the disk

As an alternative approach, why not create and clean a tree of files on disk? Filesystem is a great abstraction.

kernc avatar Dec 30 '19 00:12 kernc