Performance improvement : Cache pre-pickled documents
We have enormous documents in which some individual files .. include:: hundreds of external .rst files.
This sometimes leads to individual .doctrees files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours).
After profiling the code, repeated calls to pickle.loads() targeting those 5MB files where found. It appears that Sphinx will pickle.loads() the 5MB file at each cross-reference .
While sphinx/environment/__init__.py already caches the raw bytes for each pickled doctree it would be more efficient to cache the result of pickle.loads() instead.
Caching the pre-pickled nodes.document instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).
I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup.
I have opened a pull request with my workaround. Feel free to let me know your thoughts !
See https://github.com/sphinx-doc/sphinx/pull/12882