templ
templ copied to clipboard
runtime: Optimization Request: Improve TTFB by Streaming Rendered HTML
Background: The changes introduced in Issue #56 have an unintentional side effect related to a rendered template's TTFB (Time To First Byte).
Problem: With the latest update, the generated Go code first writes the rendered HTML to a buffer and only then writes to the final Writer
. This means the entire template has to be rendered before any data is sent to the user. This causes the TTFB to become the time taken for the complete server-side page rendering rather than the time to generate the first portion of HTML. This delayed TTFB could negatively impact SEO scores.
Example:
templ hello(name string) {
<div>Hello, { name }</div>
}
For the above template, the first piece of HTML (i.e., <div>
) could theoretically be sent immediately, allowing browsers to start rendering sooner. But with the current approach, nothing gets sent until the entire template, including </div>
, is rendered and buffered.
Request: While the latest changes have been excellent from an ergonomic standpoint, it would be beneficial to either change the current behavior or provide an option to optimize the TTFB by allowing for streaming of the rendered HTML.
An example of another templating library that makes the above request is quick template.
Hello there!
Originally, templ wrote straight to the io.Writer
that was passed in. However, during benchmarks, we found that it caused a number of allocations to be made. These allocations increase the amount of garbage collection that has to be done at some point in the future, reducing the overall throughput of the server at some indeterminate point.
In addition, IIRC, writing to the output stream repeatedly (as templ sends strings and other stuff to the network) resulted in a relatively high number of syscalls. Switching context from user space to kernel space in the syscall can slow down the process, and profiling showed that a high amount of time was being spent waiting for these calls.
So, the current design is based on the idea of using a pool of buffered writers. My understanding is that this is the technique used by quicktemplate, but I could also be wrong on that!
With a buffered writer, templ can write really quickly to a buffer from the pool, and then flush that in fewer syscalls to the network. It can then return the buffer to the pool, after clearing it, reducing GC thrashing.
Overall, we found that this rendered HTML much faster.
There are some basic benchmarks at https://github.com/a-h/templ/tree/main/benchmarks to test the performance, and the performance of templ is pretty good.
The benchmarks don't actually deal with network requests etc, so there's a risk that they're not realistic.
However, my thinking with templ was that the biggest impact on TTFB is likely the use of JavaScript as a Server-Side Language in the first place, as per the graph comparing render speed. More speed is good though! 😁
I'd be interested in seeing some load test results of alternative designs with something like Grafana's k6, or hey. The TCP MTU setting is usually much higher on localhost
, so would ideally avoid that, since that's also unrealistic.
The next actions to progress this are:
- Create a load test, using https://github.com/rakyll/hey or https://k6.io/ of:
- Not buffering the whole template before writing - i.e. flushing a buffer at a specific size, e.g. 4KB. Might be faster for big HTML templates, but also might result in more complex code or edge cases.
- Not buffering at all - probably terrible!
- Carrying on as it is today - probably fine! :)
Then we'd know for sure. If someone wants to pick that up, that would be great - shout up!
In terms of roadmap, the main issues people are complaining about are things like lack of JetBrains IDE support, and rough edges on CSS and script handling, so I want to focus on that in the next few releases (previously, formatting was the major issue, but I think that's solved in the next upcoming release).
Thanks for the reply. I understand the motivations.
With the quicktemplate README, there is mention of this:
Make sure that the io.Writer passed to Write* functions is buffered. This will minimize the number of write syscalls, which may be quite expensive.
This appears to be what bytes.Buffer
is solving for you. The bufio.NewWriter
may solve this while still writing to the original writer, which would solve TTFB. However, it does require one file call of writer.Flush()
to flush any leftover bytes not written yet.
With your benchmark I created the following quicktemplate:
{% package testhtml %}
{% import "github.com/a-h/templ" %}
{% func RenderQT(p Person) %}
<div>
<h1>{%s p.Name %}</h1>
<div style="font-family: 'sans-serif'" id="test" data-contents={%s `something with "quotes" and a <tag>` %}>
<div>email:<a href={%s string(templ.URL("mailto: " + p.Email)) %}>{%s p.Email %}</a></div>
</div>
</div>
<hr {%= wa("noshade", true) %}/>
<hr optionA {%= wa("optionB", true) %} optionC="other" {%= wa("optionD", false) %}/>
<hr noshade/>
{% endfunc %}
{% stripspace %}
{% func wa(name string, show bool) %}
{% if show %}
{%s name %}
{% endif %}
{% endfunc %}
{% endstripspace %}
Added benchmark tests, one with the strings.Builder
writer and another with bufio.NewWriter(strings.Builder)
.
func BenchmarkQuickTemplateRender(b *testing.B) {
b.ReportAllocs()
person := Person{
Name: "Luiz Bonfa",
Email: "[email protected]",
}
w := new(strings.Builder)
for i := 0; i < b.N; i++ {
WriteRenderQT(w, person)
w.Reset()
}
}
func BenchmarkQuickTemplateBufioRender(b *testing.B) {
b.ReportAllocs()
person := Person{
Name: "Luiz Bonfa",
Email: "[email protected]",
}
builder := new(strings.Builder)
w := bufio.NewWriter(builder)
for i := 0; i < b.N; i++ {
WriteRenderQT(w, person)
w.Flush()
builder.Reset()
w.Reset(builder)
}
}
The output:
BenchmarkTemplRender-8 3501919 336.6 ns/op 536 B/op 6 allocs/op
BenchmarkQuickTemplateRender-8 3023431 394.6 ns/op 856 B/op 6 allocs/op
BenchmarkQuickTemplateBufioRender-8 3953818 299.2 ns/op 344 B/op 2 allocs/op
The bufio
wrapper has fewer allocations and is faster.
Changing the logic within the templ
code to use a bufio.Writer
pool might accomplish the same behavior described above and provide the TTFB behavior too.
Thanks for looking into this.
If you want to progress this line of thinking, you can directly modify the generated code to test out the concept.
For example, you can take the benchmark and adjust it to use a bufio.Writer
:
func BufioWriterRender(p Person) templ.Component {
return templ.ComponentFunc(func(templ_7745c5c3_Ctx context.Context, templ_7745c5c3_W io.Writer) (templ_7745c5c3_Err error) {
templ_7745c5c3_Buffer, templ_7745c5c3_IsBuffer := templ_7745c5c3_W.(*bufio.Writer)
if !templ_7745c5c3_IsBuffer {
templ_7745c5c3_Buffer = bufio.NewWriter(templ_7745c5c3_W)
}
templ_7745c5c3_Ctx = templ.InitializeContext(templ_7745c5c3_Ctx)
templ_7745c5c3_Var1 := templ.GetChildren(templ_7745c5c3_Ctx)
if templ_7745c5c3_Var1 == nil {
templ_7745c5c3_Var1 = templ.NopComponent
}
templ_7745c5c3_Ctx = templ.ClearChildren(templ_7745c5c3_Ctx)
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<div><h1>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var2 string = p.Name
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var2))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</h1><div style=\"font-family: 'sans-serif'\" id=\"test\" data-contents=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(`something with "quotes" and a <tag>`))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\"><div>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Var3 := `email:`
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var3)
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<a href=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var4 templ.SafeURL = templ.URL("mailto: " + p.Email)
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(string(templ_7745c5c3_Var4)))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\">")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var5 string = p.Email
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var5))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</a></div></div></div><hr")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if true {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" noshade")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr optionA")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if true {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionB")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionC=\"other\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if false {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionD")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr noshade>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
return templ_7745c5c3_Err
})
}
func BenchmarkTemplBufioWriterRender(b *testing.B) {
b.ReportAllocs()
t := BufioWriterRender(Person{
Name: "Luiz Bonfa",
Email: "[email protected]",
})
w := new(strings.Builder)
for i := 0; i < b.N; i++ {
err := t.Render(context.Background(), w)
if err != nil {
b.Errorf("failed to render: %v", err)
}
w.Reset()
}
}
This turns out to be slower than what we have at the moment, probably because it causes an additional allocation. Not sure if it could use a buffered pool.
PASS
ok github.com/a-h/templ 0.236s
goos: darwin
goarch: arm64
pkg: github.com/a-h/templ/benchmarks/templ
BenchmarkTemplRender-10 3312931 348.2 ns/op 536 B/op 6 allocs/op
BenchmarkTemplBufioWriterRender-10 1819821 728.2 ns/op 4376 B/op 7 allocs/op
BenchmarkTemplParser-10 25100 48599 ns/op 27276 B/op 748 allocs/op
BenchmarkGoTemplateRender-10 485796 2332 ns/op 1400 B/op 38 allocs/op
BenchmarkIOWriteString-10 22776142 53.33 ns/op 320 B/op 1 allocs/op
To put this into perspective, we're talking about shaving up to 10,000 ns off the time to first byte. But, if my maths is right, at 3Mbps it takes around 10,000 ns to transfer a single character - i.e. it would probably be a greater performance improvement to improve whitespace stripping to remove a single additional character than to focus on this.
An additional point of note is that the HTTP response writer is already buffered (4KB by default), so if the response is less than 4KB in size, it will wait until it's all rendered anyway.
That's why I think a HTTP performance benchmark is probably more useful overall than these low level ones.
So, I couldn't help but look into this and created a little benchmark of actual web performance at https://github.com/a-h/templ/tree/http_benchmark
I updated the test benchmark template and added 1000 extra lines of stuff.
package main
import (
"net/http"
"github.com/a-h/templ"
)
type Person struct {
Name string
Email string
}
func main() {
t := Render(Person{
Name: "Luiz Bonfa",
Email: "[email protected]",
})
http.ListenAndServe("localhost:8080", templ.Handler(t))
}
+ hey -n 1000000 http://localhost:8080
Summary:
Total: 8.8955 secs
Slowest: 0.0094 secs
Fastest: 0.0001 secs
Average: 0.0004 secs
Requests/sec: 112416.6618
Response time histogram:
0.000 [1] |
0.001 [895572] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.002 [89241] |■■■■
0.003 [13366] |■
0.004 [1440] |
0.005 [240] |
0.006 [68] |
0.007 [53] |
0.008 [17] |
0.008 [1] |
0.009 [1] |
Latency distribution:
10% in 0.0001 secs
25% in 0.0002 secs
50% in 0.0003 secs
75% in 0.0005 secs
90% in 0.0010 secs
95% in 0.0014 secs
99% in 0.0021 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0001 secs, 0.0094 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0013 secs
req write: 0.0000 secs, 0.0000 secs, 0.0056 secs
resp wait: 0.0004 secs, 0.0000 secs, 0.0094 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0057 secs
Status code distribution:
[200] 1000000 responses
The resp wait (the bit we care about in this issue) was:
(average, min, max)
resp wait: 0.0004 secs, 0.0000 secs, 0.0094 secs
So... I hacked the generated code to try out the concept.
First, I added some extra functions to templ:
var bufferedWriterPool = sync.Pool{
New: func() any {
return new(bufio.Writer)
},
}
func GetBufferedWriter(w io.Writer) *bufio.Writer {
bw := bufferedWriterPool.Get().(*bufio.Writer)
bw.Reset(w)
return bw
}
func ReleaseBufferedWriter(b *bufio.Writer) {
b.Reset(nil)
bufferedWriterPool.Put(b)
}
Then I updated the generated code to use it:
// Code generated by [email protected] DO NOT EDIT.
package main
//lint:file-ignore SA4006 This context is only used if a nested component is present.
import "github.com/a-h/templ"
import "context"
import "io"
import "bufio"
func Render(p Person) templ.Component {
return templ.ComponentFunc(func(templ_7745c5c3_Ctx context.Context, templ_7745c5c3_W io.Writer) (templ_7745c5c3_Err error) {
templ_7745c5c3_Buffer, templ_7745c5c3_IsBuffer := templ_7745c5c3_W.(*bufio.Writer)
if !templ_7745c5c3_IsBuffer {
templ_7745c5c3_Buffer = templ.GetBufferedWriter(templ_7745c5c3_W)
defer templ.ReleaseBufferedWriter(templ_7745c5c3_Buffer)
}
templ_7745c5c3_Ctx = templ.InitializeContext(templ_7745c5c3_Ctx)
templ_7745c5c3_Var1 := templ.GetChildren(templ_7745c5c3_Ctx)
if templ_7745c5c3_Var1 == nil {
templ_7745c5c3_Var1 = templ.NopComponent
}
templ_7745c5c3_Ctx = templ.ClearChildren(templ_7745c5c3_Ctx)
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<html><head><title>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Var2 := `Test page`
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var2)
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</title></head><body><div><h1>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var3 string = p.Name
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var3))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</h1><div style=\"font-family: 'sans-serif'\" id=\"test\" data-contents=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(`something with "quotes" and a <tag>`))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\"><div>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Var4 := `email:`
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var4)
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<a href=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var5 templ.SafeURL = templ.URL("mailto: " + p.Email)
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(string(templ_7745c5c3_Var5)))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\">")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var6 string = p.Email
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var6))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</a></div></div></div><hr")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if true {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" noshade")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr optionA")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if true {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionB")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionC=\"other\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if false {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionD")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr noshade>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
for i := 0; i < 1000; i++ {
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<p>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Var7 := `Adding some fake content.`
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var7)
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</p>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</body></html>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
if !templ_7745c5c3_IsBuffer {
templ_7745c5c3_Err = templ_7745c5c3_Buffer.Flush()
}
return templ_7745c5c3_Err
})
}
And... it was slower:
+ hey -n 1000000 http://localhost:8080
Summary:
Total: 12.3900 secs
Slowest: 0.0134 secs
Fastest: 0.0001 secs
Average: 0.0006 secs
Requests/sec: 80709.9770
Response time histogram:
0.000 [1] |
0.001 [924308] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.003 [54099] |■■
0.004 [15660] |■
0.005 [3349] |
0.007 [1761] |
0.008 [662] |
0.009 [128] |
0.011 [20] |
0.012 [7] |
0.013 [5] |
Latency distribution:
10% in 0.0002 secs
25% in 0.0003 secs
50% in 0.0004 secs
75% in 0.0007 secs
90% in 0.0012 secs
95% in 0.0018 secs
99% in 0.0035 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0001 secs, 0.0134 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0021 secs
req write: 0.0000 secs, 0.0000 secs, 0.0070 secs
resp wait: 0.0005 secs, 0.0000 secs, 0.0116 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0132 secs
Status code distribution:
[200] 1000000 responses
By adding the loop to add 1000 elements, it turns it into a 32KB file, so I think it's big enough to benefit from buffering.
So, from these figures, it looks like in real world usage using a bufio.Writer
instead of a bytes.Buffer
change drops performance from 112,416 requests per second to 80,709 requests per second.
For completeness, I updated the test to instead of doing 1000 copies of the <p>
tag, up to 10,000 copies. This means the page is much bigger, so you might expect time to first byte to be much higher if you copy it all to a bytes.Buffer
first.
Here's the figures from bufio.Writer
.
+ hey -n 1000000 http://localhost:8080
Summary:
Total: 104.0192 secs
Slowest: 0.0970 secs
Fastest: 0.0005 secs
Average: 0.0052 secs
Requests/sec: 9613.6124
Response time histogram:
0.000 [1] |
0.010 [898979] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.020 [80945] |■■■■
0.029 [13945] |■
0.039 [3827] |
0.049 [1340] |
0.058 [619] |
0.068 [266] |
0.078 [75] |
0.087 [2] |
0.097 [1] |
Latency distribution:
10% in 0.0014 secs
25% in 0.0023 secs
50% in 0.0039 secs
75% in 0.0062 secs
90% in 0.0102 secs
95% in 0.0139 secs
99% in 0.0252 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0005 secs, 0.0970 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0014 secs
req write: 0.0000 secs, 0.0000 secs, 0.0144 secs
resp wait: 0.0034 secs, 0.0000 secs, 0.0759 secs
resp read: 0.0018 secs, 0.0001 secs, 0.0751 secs
Status code distribution:
[200] 1000000 responses
And here's the current implementation that uses bytes.Buffer
.
+ hey -n 1000000 http://localhost:8080
Summary:
Total: 32.0821 secs
Slowest: 0.0377 secs
Fastest: 0.0002 secs
Average: 0.0016 secs
Requests/sec: 31170.0649
Response time histogram:
0.000 [1] |
0.004 [947312] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.008 [48441] |■■
0.011 [3670] |
0.015 [478] |
0.019 [76] |
0.023 [15] |
0.026 [5] |
0.030 [1] |
0.034 [0] |
0.038 [1] |
Latency distribution:
10% in 0.0004 secs
25% in 0.0007 secs
50% in 0.0012 secs
75% in 0.0021 secs
90% in 0.0032 secs
95% in 0.0040 secs
99% in 0.0063 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0002 secs, 0.0377 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0011 secs
req write: 0.0000 secs, 0.0000 secs, 0.0066 secs
resp wait: 0.0014 secs, 0.0001 secs, 0.0376 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0217 secs
Status code distribution:
[200] 1000000 responses
Here it's even more of a win for the bytes.Buffer
. The resp wait time is still shorter at 0.0014 vs 0.0034, and the rps is 31,170 vs 9,613.
If there's an alternative to bufio.Writer
that is faster... then great.
Any ideas @jtarchie
I wonder if giving the buffers an initial size that's a bit bigger than default would affect the performance much?
https://pkg.go.dev/bufio#NewWriterSize
Yes. On my machine
new(bytes.Buffer)
-> 140k rps
new(bufio.Writer)
(which defaults to 4kb buffer) -> 130k rps
bufio.NewWriterSize(nil, 32*1024)
-> 146k rps
bytes.NewBuffer(make([]byte(32*1024))
-> 160k rps
There are still downsides to using buffer - say you have many templates with 100kb data and a single one with 200mb of data. All of your buffers will eventually grow to 200mb size, unless you collect statistics like bytebufferpool does https://github.com/valyala/bytebufferpool/blob/master/pool.go
~~Also w.(*bytes.Buffer)
assertion is not necessary since NewWriterSize already does it
https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/bufio/bufio.go;l=592
And it should work fine even if there is a second buffer with smaller size since it will be bypassed
https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/bufio/bufio.go;l=682~~
actually Reset(w) doesn't
Have marked as needs decision, as it seems we have all the information now to decide if we want to implement anything for this.
Based on the data, I think it makes sense to use a sync buffered write pool for this. Looks like the buffered writing might result in 146k rps vs 140k rps, or around 4% improvement according to @kaey's benchmarks.
However, I'm concerned about the potential for RAM growth over time as outlined by @kaey. Having this happen to you violates the principle of least surprise.
For runtime stuff, I want to stick to the standard library as much as possible, since it benefits from the security focus of the Go project as a whole, so although the bytebufferpool sounds like a good library to use, I'd prefer to stick to the standard library for runtime if possible.
However, I don't plan to work on this in the short-ish future, because CSS media queries, improvements to LSP testing, and providing more documentation / examples would be higher up the list in priority for me at the moment.
See comment on https://github.com/a-h/templ/discussions/781#discussioncomment-9680731 - I think this could be a way forward.
@joerdav is working on something that affects the generator (automatic imports), so I don't want to implement this until he's finished, but I think it's relatively clear how to proceed.