Performance improvement : Cache pre-pickled documents

Open ArthurAttout opened this issue 1 year ago • 1 comments

We have enormous documents in which some individual files .. include:: hundreds of external .rst files.

This sometimes leads to individual .doctrees files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours).

After profiling the code, repeated calls to pickle.loads() targeting those 5MB files where found. It appears that Sphinx will pickle.loads() the 5MB file at each cross-reference .

While sphinx/environment/__init__.py already caches the raw bytes for each pickled doctree it would be more efficient to cache the result of pickle.loads() instead.

Caching the pre-pickled nodes.document instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).

I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup.

I have opened a pull request with my workaround. Feel free to let me know your thoughts !

Sep 11 '24 13:09 ArthurAttout

See https://github.com/sphinx-doc/sphinx/pull/12882

Sep 11 '24 13:09 ArthurAttout