hugo
hugo copied to clipboard
Performance Improvements in Template Execution
Summary
As part of a sustainability initiative, I found that Hugo builds for complex themes (e.g. Docsy in my case) can take longer and consume more CPU cycles than expected.
Expectation
Hugo needs single-digit milliseconds to render an individual page:
Observed Condition
Hugo needs (133685ms / 1707) = ~78ms per page:
...
INFO build: running step render duration 2m13.152050519s
INFO build: running step postProcess duration 8.827µs
| EN
-------------------+-------
Pages | 1707
Paginator pages | 5
Non-page files | 485
Static files | 40
Processed images | 2
Aliases | 39
Sitemaps | 1
Cleaned | 0
Total in 133685 ms
Analysis
Some profiling, tracing and experimentation lead to the following result:
- Hugo scales very well when one throws more cores on it. Tracing indicated that for our site around 18 cores would be the optimum (the tested GitHub runner features 2).
- There does not seem to be a a single slow function or low hanging fruit. The slowest syscall used (
fstatat, which could maybe be optimized/avoided) just consumes ~1s of our total ~120s runtime. This is less than 1%. (!) - Use of reflection in template execution, mostly for evaluating expression and especially calling functions slows Hugo down, when using complex themes.
-
Go routine traced execution time top 5:
-
pprof profiles:
Proposed Action
- Continue or build on top of https://github.com/gohugoio/hugo/issues/6594.
- Make tradeoffs towards Hugo performance🚀 and efficiency🌱💰 over minimal code (maintainability) and alignment with Go text/template development.
While Go templates isn't the fastest on earth, I don't think your conclusion above is correct.
In all the examples of "slow Hugo sites" I have seen (and I have seen one), it always boils down to doing too much work.
Complex themes (e.g. docsy), combined with a decent amount of pages, do indeed highlight the limits of Hugo's current architecture, in particular around template execution.
Yes:
- a simpler theme, with only a small number of partials, expressions and especially function calls, will be faster.
--renderSegmentsmight allow us to save time that way. (Assuming we'd be willing to store/cache build output in some way between builds, which isn't great for reproducibility.)- one always reaches certain limits at some point.
However:
- forcing Hugo users to apply workarounds should be pushed back as much as possible.
- the avoidable waste of CPU time is a sin, even with workarounds applied.
- the comparative "slowness" of the template execution could paint the larger architecture (e.g. use of go routines) in an unnecessarily bad light, marketing-wise.
I'd not request such optimizations from projects in incubating or prototype phase, but mature products with many users, like Hugo, could indeed profit.
This seems very desirable.
Hugo needs (133685ms / 1707) = ~78ms per page:
My claim is still that the above comes from one or more loops with quadratic complexity. The best fix for that will never be to spend tons of resources creating a new and slightly faster template engine. The fix lives in improving the template logic (adding some cache?).
I'm closing this. This proposal looks like a bug report, but is lacking a concrete problem case. As a proposal about "template execution" it is very vague. We have already done lots in this department.
As to the ~78ms per page example above. If you somehow can tell with some certainty that this comes from Hugo not being performant enough, please open up a more concrete bug issue.
As to profiling your site, I would recommend you look at https://gohugo.io/functions/debug/timer/ and insert that in the suspected hot paths, instead of the pprof output that seems to confuse you. You see the word partial and conclude that it "must be slow template execution", not considering that one single partial may be calculating all the primes ... or something.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.