mkdocs-git-revision-date-localized-plugin
mkdocs-git-revision-date-localized-plugin copied to clipboard
Poor performance on large monorepos
When trying to use this on a large monorepo, performance is very poor -- one docs site with ~30 pages has gone from taking a a few seconds to build to a few minutes.
I'm testing a change that would parallelize calls to git log up front once we have the list of files on_files
since mkdocs itself can't be parallelized to run on_page_markdown
in a multi-threaded fashion.
Curious if anyone else has run into this and if this is something that would be a useful contribution via PR.
@squidfunk I know you did this in several of your plugins. Any wisdom to share?
In Material for MkDocs, the privacy plugin, optimize plugin and new social plugin make heavy use of concurrent futures and caching of (partial) results. It's a new technique I learned when first writing the optimize plugin. The general idea is to split off work into threads where possible, and only reconcile jobs when necessary. Examples:
-
The new social plugin entirely offloads image creation in
on_page_markdown
into threads and generates all layers in parallel after deduplicating them, and reconciles them for compositing the final image. It then reconciles the composited images inon_post_page
to copy the generated image returned from the future in order to ensure a consistent state for other plugins that run afteron_post_page
. -
The privacy plugin searches for external assets in
on_page_content
and enqueues them for downloading, moving that into concurrent threads as well, since some assets need more assets to be downloaded (e.g. Google Fonts CSS contains links to web font files that need to be downloaded as well). Then, inon_post_page
andon_post_template
, external assets are replaced, and potentially further discovered external assets (added by other plugins) are downloaded synchronously. However, the plugin does as much work as possible asynchronously. -
The privacy plugin and optimize plugin can actually work together (!), downloading external assets and pushing them through the optimization pipeline, by reconciling downloaded assets in
on_env
(in the privacy plugin), which can then be picked up by the optimize plugin. This allows to build documentation with external assets (e.g. screenshots), hosting them outside of a repository, but inlining heavily optimized versions of them into the build.
I plan to write a blog post about my learnings in writing MkDocs plugins in the future.
I have a prototype patch for the time stamp one where I do something similar. I'm using on_files to compute time stamps for all files, finding that 10 is about the max parallelism that works reliably.
As a test case a site with 78 markdown files in a large monorepo takes 5 seconds to generate with backstage techdocs cli. With the time stamp plug-in this increased to 378 seconds. With the parallel precomputed time stamp patch this goes down to 69 seconds -- still quite a big hit but a big improvement.