sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

Enable parallel reading if requested, even if there are few documents

Open mgeier opened this issue 1 year ago • 5 comments

I would expect that if parallel reading is requested (and possible), that Sphinx actually does parallel reading.

However, it turns out that it doesn't do it if there are fewer than 6 documents.

The limit has been added 10 years ago by @birkenfeld in 1f23a5c369ba58c6ec5ab806e25c63dd615327dd, but I didn't find any motivation for it.

I found this quite confusing when trying to debug this issue: https://github.com/spatialaudio/nbsphinx/issues/801

In many cases this won't be noticed because reading is fast, but in my case (executing Jupyter notebooks with nbsphinx in Sphinx's "reading" phase) it often is not fast.

Feature or Bugfix

  • Bugfix

mgeier avatar Aug 17 '24 09:08 mgeier

Please could you also add a test of some description and a CHANGES entry?

A

AA-Turner avatar Aug 17 '24 10:08 AA-Turner

Please could you also add a test of some description and a CHANGES entry?

I've added CHANGES in 27248586c452ecf36c3f3fef2ad4297a661b699c, but I don't know how to add tests for that.

Can you point me to existing tests or (even better!) add those tests yourself?

mgeier avatar Aug 24 '24 15:08 mgeier

but I don't know how to add tests for that.

We are currently testing it in:

@pytest.mark.sphinx(
    'html',
    testroot='root',
    parallel=2,
)
def test_html_parallel(app):
    app.build()

The test is not really helpful though. In order to test, you can check whether _read_parallel or _read_serial was called using mocked objects (sorry, my previous comment was about write_parallel).

picnixz avatar Aug 25 '24 15:08 picnixz

Thanks @picnixz for the hint!

However, this exceeds my Sphinx abilities and the amount of time I'm willing to invest.

mgeier avatar Aug 27 '24 19:08 mgeier

I can try to find some time for that!

picnixz avatar Aug 28 '24 08:08 picnixz

I'm supportive of this suggestion, because I think it would help to discover parallelism-related nondeterminism issues.

To elaborate on that: we often request minimal repro case examples when attempting to debug issues, because it helps to narrow the scope of the relevant code. However: if the code behaves differently depending on input-related thresholds, then it becomes more difficult to determine what a minimal repro case is (both for the bugreporter and code reviewers / maintainers).

Short-term a change such as this could appear to make Sphinx less reliable - but I think it would be a forcing function to help us improve build reliability (a property that should not depend on the size of the documentation set).

jayaddison avatar Sep 10 '24 22:09 jayaddison