mkdocs-git-revision-date-localized-plugin
mkdocs-git-revision-date-localized-plugin copied to clipboard
Precompute all last commit timestamps in on_files
In parallel, precompute all last commit timestamps in on_files so that we can process more quickly. We need to do this when we have all the files so we can do the work in parallel, rather than on_page_markdown.
This does not pre-compute for first commit timestamp. Can significantly improve wall time ref: #115
Looking for some feedback on this approach. If this looks reasonable we can figure out support for the first commit timestamp as well as a way to configure parallelism. This currently takes the min of 10 or however many cpus are reported.
On an M1 Max Macbook Pro (8 performance, 2 efficiency cores) this resulted in a speed up of ~5.5x when processing a large monorepo that originally took 378 seconds down to 69 seconds. Tested on 78 markdown files rendered in a repo of approximately 700k commits and 500k files.
Sorry for the very late reply, this project has not been a priority..
Very cool PR, 5.5x improvement is considerable!
One problem I see however is using the files collection at on_files() instead of the page at on_page_markdown() . The reason is that some other plugins move files around. Here's an example mkdocs-monorepo
They basically create a new docs_dir from several source folders:
https://github.com/backstage/mkdocs-monorepo-plugin/blob/c778b3010eb986a2f3b719bc7a3d29d86236c238/mkdocs_monorepo_plugin/plugin.py#L54-L61
And then they update the page.abs_src_url :
https://github.com/backstage/mkdocs-monorepo-plugin/blob/c778b3010eb986a2f3b719bc7a3d29d86236c238/mkdocs_monorepo_plugin/plugin.py#L65-L72
So this bit from the PR will need some more edge case handling:
https://github.com/timvink/mkdocs-git-revision-date-localized-plugin/pull/116/files#diff-38d392fd1ac6a39ad46a5d047e294c69fe0f1b6aa8fc7fea3a35c1846925d21cR166-R172
Another promising avenue might be to tweak git a bit, there are a couple of settings for large repos that might git blame operations much faster:
https://www.git-tower.com/blog/git-performance/
Have you tried something like that? Might be worth documenting in this plugin
Yeah, we're well aware of all those git features to make monorepos less of a pain, but it is still incredibly slow. To be fair, when updating docs for a single project or two the time hit is probably still acceptable as the application CI is going to take longer in most cases -- but if doing a bulk update across many docs in the repo it's going to time out CI. (Not to mention the $ cost of longer running CI in general).