numpydoc icon indicating copy to clipboard operation
numpydoc copied to clipboard

improve section detection

Open keewis opened this issue 3 years ago • 4 comments

Potentially fixes #316.

This tries to allow parsing sections which are not separated by blank lines (there should probably be a warning in that case, I'll add that once the general idea has been approved). In order to get that to work used a few tricks (e.g. add a optional doc parameter to _is_at_section to allow calling it on a different reader) so it might need some refactoring before being truly ready.

cc @Carreau, my main motivation was trying to get velin to auto-fix this

keewis avatar May 24 '21 17:05 keewis

yielding StopIteration seems like a bug: https://github.com/numpy/numpydoc/blob/265ab91fca60192bbc56217d4bae958e326b9b1b/numpydoc/docscrape.py#L214-L219

because it will cause this to fail with a obscure error: https://github.com/numpy/numpydoc/blob/265ab91fca60192bbc56217d4bae958e326b9b1b/numpydoc/docscrape.py#L380-L381

I think this should be fixed, either by changing the yield value to yield StopIteration, StopIteration or yield "", [], or by raising an error instead (if that's what that was supposed to indicate).

keewis avatar May 24 '21 23:05 keewis

Ping. Does anyone have any comments on this?

keewis avatar Jan 16 '22 12:01 keewis

Ping. Does anyone have any comments on this?

I think the discussion in gh-316 shows this is not desirable. Detecting in order to raise a better warning or even an exception makes sense though.

rgommers avatar Jan 19 '22 10:01 rgommers

apologies for pinging and then forgetting about it for a year.

I implemented the requested change, such that it will now warn on every missing empty line (between summary and the first section, or between two sections).

The current implementation comes with a (slight?) performance regression because NumpyDocString._is_at_section is now called for every line, and I had to do a slightly ugly trick to make sure _is_at_section doesn't swallow empty lines.

Instead, I could also imagine doing a two-pass implementation: find all separators (multiple - or = per line) and check if those belong to a section which is not preceded by an empty line in the first pass, then run the actual extraction in the second pass. That might make the code a bit simpler and potentially faster, but every line would be visited twice (I'm not sure how much of an issue that would be).

Edit: I think the CI failures are unrelated (not sure, though)

keewis avatar Jan 03 '23 16:01 keewis