templ runtime: Optimization Request: Improve TTFB by Streaming Rendered HTML

Background: The changes introduced in Issue #56 have an unintentional side effect related to a rendered template's TTFB (Time To First Byte).

Problem: With the latest update, the generated Go code first writes the rendered HTML to a buffer and only then writes to the final Writer. This means the entire template has to be rendered before any data is sent to the user. This causes the TTFB to become the time taken for the complete server-side page rendering rather than the time to generate the first portion of HTML. This delayed TTFB could negatively impact SEO scores.

Example:

templ hello(name string) {
    <div>Hello, { name }</div>
}

For the above template, the first piece of HTML (i.e., <div>) could theoretically be sent immediately, allowing browsers to start rendering sooner. But with the current approach, nothing gets sent until the entire template, including </div>, is rendered and buffered.

Request: While the latest changes have been excellent from an ergonomic standpoint, it would be beneficial to either change the current behavior or provide an option to optimize the TTFB by allowing for streaming of the rendered HTML.

Oct 05 '23 01:10 jtarchie

An example of another templating library that makes the above request is quick template.

Oct 06 '23 13:10 jtarchie

Hello there!

Originally, templ wrote straight to the io.Writer that was passed in. However, during benchmarks, we found that it caused a number of allocations to be made. These allocations increase the amount of garbage collection that has to be done at some point in the future, reducing the overall throughput of the server at some indeterminate point.

In addition, IIRC, writing to the output stream repeatedly (as templ sends strings and other stuff to the network) resulted in a relatively high number of syscalls. Switching context from user space to kernel space in the syscall can slow down the process, and profiling showed that a high amount of time was being spent waiting for these calls.

So, the current design is based on the idea of using a pool of buffered writers. My understanding is that this is the technique used by quicktemplate, but I could also be wrong on that!

With a buffered writer, templ can write really quickly to a buffer from the pool, and then flush that in fewer syscalls to the network. It can then return the buffer to the pool, after clearing it, reducing GC thrashing.

Overall, we found that this rendered HTML much faster.

There are some basic benchmarks at https://github.com/a-h/templ/tree/main/benchmarks to test the performance, and the performance of templ is pretty good.

The benchmarks don't actually deal with network requests etc, so there's a risk that they're not realistic.

However, my thinking with templ was that the biggest impact on TTFB is likely the use of JavaScript as a Server-Side Language in the first place, as per the graph comparing render speed. More speed is good though! 😁

I'd be interested in seeing some load test results of alternative designs with something like Grafana's k6, or hey. The TCP MTU setting is usually much higher on localhost, so would ideally avoid that, since that's also unrealistic.

The next actions to progress this are:

Create a load test, using https://github.com/rakyll/hey or https://k6.io/ of:
- Not buffering the whole template before writing - i.e. flushing a buffer at a specific size, e.g. 4KB. Might be faster for big HTML templates, but also might result in more complex code or edge cases.
- Not buffering at all - probably terrible!
- Carrying on as it is today - probably fine! :)

Then we'd know for sure. If someone wants to pick that up, that would be great - shout up!

In terms of roadmap, the main issues people are complaining about are things like lack of JetBrains IDE support, and rough edges on CSS and script handling, so I want to focus on that in the next few releases (previously, formatting was the major issue, but I think that's solved in the next upcoming release).

Oct 13 '23 20:10 a-h

Thanks for the reply. I understand the motivations.

With the quicktemplate README, there is mention of this:

Make sure that the io.Writer passed to Write* functions is buffered. This will minimize the number of write syscalls, which may be quite expensive.

This appears to be what bytes.Buffer is solving for you. The bufio.NewWriter may solve this while still writing to the original writer, which would solve TTFB. However, it does require one file call of writer.Flush() to flush any leftover bytes not written yet.

Oct 14 '23 14:10 jtarchie

With your benchmark I created the following quicktemplate:

{% package testhtml %}

{% import "github.com/a-h/templ" %}

{% func RenderQT(p Person) %}
	<div>
		<h1>{%s p.Name %}</h1>
		<div style="font-family: &#39;sans-serif&#39;" id="test" data-contents={%s `something with "quotes" and a <tag>` %}>
			<div>email:<a href={%s string(templ.URL("mailto: " + p.Email)) %}>{%s p.Email %}</a></div>
		</div>
	</div>
	<hr {%= wa("noshade", true) %}/>
	<hr optionA {%= wa("optionB", true) %} optionC="other" {%= wa("optionD", false) %}/>
	<hr noshade/>
{% endfunc %}

{% stripspace %}
{% func wa(name string, show bool) %}
    {% if show %}
        {%s name %}
    {% endif %}
{% endfunc %}
{% endstripspace %}

Added benchmark tests, one with the strings.Builder writer and another with bufio.NewWriter(strings.Builder).

func BenchmarkQuickTemplateRender(b *testing.B) {
	b.ReportAllocs()
	person := Person{
		Name:  "Luiz Bonfa",
		Email: "[email protected]",
	}

	w := new(strings.Builder)
	for i := 0; i < b.N; i++ {
		WriteRenderQT(w, person)
		w.Reset()
	}
}

func BenchmarkQuickTemplateBufioRender(b *testing.B) {
	b.ReportAllocs()
	person := Person{
		Name:  "Luiz Bonfa",
		Email: "[email protected]",
	}

	builder := new(strings.Builder)
	w := bufio.NewWriter(builder)
	for i := 0; i < b.N; i++ {
		WriteRenderQT(w, person)
		w.Flush()
		builder.Reset()
		w.Reset(builder)
	}
}

The output:

BenchmarkTemplRender-8                   3501919               336.6 ns/op           536 B/op          6 allocs/op
BenchmarkQuickTemplateRender-8           3023431               394.6 ns/op           856 B/op          6 allocs/op
BenchmarkQuickTemplateBufioRender-8      3953818               299.2 ns/op           344 B/op          2 allocs/op

The bufio wrapper has fewer allocations and is faster.

Changing the logic within the templ code to use a bufio.Writer pool might accomplish the same behavior described above and provide the TTFB behavior too.

Oct 14 '23 14:10 jtarchie

Thanks for looking into this.

If you want to progress this line of thinking, you can directly modify the generated code to test out the concept.

For example, you can take the benchmark and adjust it to use a bufio.Writer:

func BufioWriterRender(p Person) templ.Component {
	return templ.ComponentFunc(func(templ_7745c5c3_Ctx context.Context, templ_7745c5c3_W io.Writer) (templ_7745c5c3_Err error) {
		templ_7745c5c3_Buffer, templ_7745c5c3_IsBuffer := templ_7745c5c3_W.(*bufio.Writer)
		if !templ_7745c5c3_IsBuffer {
			templ_7745c5c3_Buffer = bufio.NewWriter(templ_7745c5c3_W)
		}
		templ_7745c5c3_Ctx = templ.InitializeContext(templ_7745c5c3_Ctx)
		templ_7745c5c3_Var1 := templ.GetChildren(templ_7745c5c3_Ctx)
		if templ_7745c5c3_Var1 == nil {
			templ_7745c5c3_Var1 = templ.NopComponent
		}
		templ_7745c5c3_Ctx = templ.ClearChildren(templ_7745c5c3_Ctx)
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<div><h1>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var2 string = p.Name
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var2))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</h1><div style=\"font-family: &#39;sans-serif&#39;\" id=\"test\" data-contents=\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(`something with "quotes" and a <tag>`))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\"><div>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		templ_7745c5c3_Var3 := `email:`
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var3)
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<a href=\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var4 templ.SafeURL = templ.URL("mailto: " + p.Email)
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(string(templ_7745c5c3_Var4)))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\">")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var5 string = p.Email
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var5))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</a></div></div></div><hr")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if true {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" noshade")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr optionA")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if true {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionB")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionC=\"other\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if false {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionD")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr noshade>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		return templ_7745c5c3_Err
	})
}

func BenchmarkTemplBufioWriterRender(b *testing.B) {
	b.ReportAllocs()
	t := BufioWriterRender(Person{
		Name:  "Luiz Bonfa",
		Email: "[email protected]",
	})

	w := new(strings.Builder)
	for i := 0; i < b.N; i++ {
		err := t.Render(context.Background(), w)
		if err != nil {
			b.Errorf("failed to render: %v", err)
		}
		w.Reset()
	}
}

This turns out to be slower than what we have at the moment, probably because it causes an additional allocation. Not sure if it could use a buffered pool.

PASS
ok      github.com/a-h/templ    0.236s
goos: darwin
goarch: arm64
pkg: github.com/a-h/templ/benchmarks/templ
BenchmarkTemplRender-10                  3312931               348.2 ns/op           536 B/op          6 allocs/op
BenchmarkTemplBufioWriterRender-10       1819821               728.2 ns/op          4376 B/op          7 allocs/op
BenchmarkTemplParser-10                    25100             48599 ns/op           27276 B/op        748 allocs/op
BenchmarkGoTemplateRender-10              485796              2332 ns/op            1400 B/op         38 allocs/op
BenchmarkIOWriteString-10               22776142                53.33 ns/op          320 B/op          1 allocs/op

To put this into perspective, we're talking about shaving up to 10,000 ns off the time to first byte. But, if my maths is right, at 3Mbps it takes around 10,000 ns to transfer a single character - i.e. it would probably be a greater performance improvement to improve whitespace stripping to remove a single additional character than to focus on this.

An additional point of note is that the HTTP response writer is already buffered (4KB by default), so if the response is less than 4KB in size, it will wait until it's all rendered anyway.

That's why I think a HTTP performance benchmark is probably more useful overall than these low level ones.

Oct 15 '23 13:10 a-h

So, I couldn't help but look into this and created a little benchmark of actual web performance at https://github.com/a-h/templ/tree/http_benchmark

I updated the test benchmark template and added 1000 extra lines of stuff.

package main

import (
	"net/http"

	"github.com/a-h/templ"
)

type Person struct {
	Name  string
	Email string
}

func main() {
	t := Render(Person{
		Name:  "Luiz Bonfa",
		Email: "[email protected]",
	})
	http.ListenAndServe("localhost:8080", templ.Handler(t))
}

+ hey -n 1000000 http://localhost:8080

Summary:
  Total:        8.8955 secs
  Slowest:      0.0094 secs
  Fastest:      0.0001 secs
  Average:      0.0004 secs
  Requests/sec: 112416.6618
  

Response time histogram:
  0.000 [1]     |
  0.001 [895572]        |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [89241] |■■■■
  0.003 [13366] |■
  0.004 [1440]  |
  0.005 [240]   |
  0.006 [68]    |
  0.007 [53]    |
  0.008 [17]    |
  0.008 [1]     |
  0.009 [1]     |


Latency distribution:
  10% in 0.0001 secs
  25% in 0.0002 secs
  50% in 0.0003 secs
  75% in 0.0005 secs
  90% in 0.0010 secs
  95% in 0.0014 secs
  99% in 0.0021 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0001 secs, 0.0094 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0013 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0056 secs
  resp wait:    0.0004 secs, 0.0000 secs, 0.0094 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0057 secs

Status code distribution:
  [200] 1000000 responses

The resp wait (the bit we care about in this issue) was:

(average, min, max)
resp wait:    0.0004 secs, 0.0000 secs, 0.0094 secs

So... I hacked the generated code to try out the concept.

First, I added some extra functions to templ:

var bufferedWriterPool = sync.Pool{
	New: func() any {
		return new(bufio.Writer)
	},
}

func GetBufferedWriter(w io.Writer) *bufio.Writer {
	bw := bufferedWriterPool.Get().(*bufio.Writer)
	bw.Reset(w)
	return bw
}

func ReleaseBufferedWriter(b *bufio.Writer) {
	b.Reset(nil)
	bufferedWriterPool.Put(b)
}

Then I updated the generated code to use it:

// Code generated by [email protected] DO NOT EDIT.

package main

//lint:file-ignore SA4006 This context is only used if a nested component is present.

import "github.com/a-h/templ"
import "context"
import "io"
import "bufio"

func Render(p Person) templ.Component {
	return templ.ComponentFunc(func(templ_7745c5c3_Ctx context.Context, templ_7745c5c3_W io.Writer) (templ_7745c5c3_Err error) {
		templ_7745c5c3_Buffer, templ_7745c5c3_IsBuffer := templ_7745c5c3_W.(*bufio.Writer)
		if !templ_7745c5c3_IsBuffer {
			templ_7745c5c3_Buffer = templ.GetBufferedWriter(templ_7745c5c3_W)
			defer templ.ReleaseBufferedWriter(templ_7745c5c3_Buffer)
		}
		templ_7745c5c3_Ctx = templ.InitializeContext(templ_7745c5c3_Ctx)
		templ_7745c5c3_Var1 := templ.GetChildren(templ_7745c5c3_Ctx)
		if templ_7745c5c3_Var1 == nil {
			templ_7745c5c3_Var1 = templ.NopComponent
		}
		templ_7745c5c3_Ctx = templ.ClearChildren(templ_7745c5c3_Ctx)
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<html><head><title>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		templ_7745c5c3_Var2 := `Test page`
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var2)
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</title></head><body><div><h1>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var3 string = p.Name
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var3))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</h1><div style=\"font-family: &#39;sans-serif&#39;\" id=\"test\" data-contents=\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(`something with "quotes" and a <tag>`))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\"><div>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		templ_7745c5c3_Var4 := `email:`
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var4)
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<a href=\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var5 templ.SafeURL = templ.URL("mailto: " + p.Email)
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(string(templ_7745c5c3_Var5)))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("\">")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		var templ_7745c5c3_Var6 string = p.Email
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var6))
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</a></div></div></div><hr")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if true {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" noshade")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr optionA")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if true {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionB")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionC=\"other\"")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if false {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(" optionD")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("><hr noshade>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		for i := 0; i < 1000; i++ {
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("<p>")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
			templ_7745c5c3_Var7 := `Adding some fake content.`
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ_7745c5c3_Var7)
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
			_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</p>")
			if templ_7745c5c3_Err != nil {
				return templ_7745c5c3_Err
			}
		}
		_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString("</body></html>")
		if templ_7745c5c3_Err != nil {
			return templ_7745c5c3_Err
		}
		if !templ_7745c5c3_IsBuffer {
			templ_7745c5c3_Err = templ_7745c5c3_Buffer.Flush()
		}
		return templ_7745c5c3_Err
	})
}

And... it was slower:

+ hey -n 1000000 http://localhost:8080

Summary:
  Total:        12.3900 secs
  Slowest:      0.0134 secs
  Fastest:      0.0001 secs
  Average:      0.0006 secs
  Requests/sec: 80709.9770
  

Response time histogram:
  0.000 [1]     |
  0.001 [924308]        |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.003 [54099] |■■
  0.004 [15660] |■
  0.005 [3349]  |
  0.007 [1761]  |
  0.008 [662]   |
  0.009 [128]   |
  0.011 [20]    |
  0.012 [7]     |
  0.013 [5]     |


Latency distribution:
  10% in 0.0002 secs
  25% in 0.0003 secs
  50% in 0.0004 secs
  75% in 0.0007 secs
  90% in 0.0012 secs
  95% in 0.0018 secs
  99% in 0.0035 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0001 secs, 0.0134 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0021 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0070 secs
  resp wait:    0.0005 secs, 0.0000 secs, 0.0116 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0132 secs

Status code distribution:
  [200] 1000000 responses

By adding the loop to add 1000 elements, it turns it into a 32KB file, so I think it's big enough to benefit from buffering.

So, from these figures, it looks like in real world usage using a bufio.Writer instead of a bytes.Buffer change drops performance from 112,416 requests per second to 80,709 requests per second.

For completeness, I updated the test to instead of doing 1000 copies of the <p> tag, up to 10,000 copies. This means the page is much bigger, so you might expect time to first byte to be much higher if you copy it all to a bytes.Buffer first.

Here's the figures from bufio.Writer.

+ hey -n 1000000 http://localhost:8080

Summary:
  Total:        104.0192 secs
  Slowest:      0.0970 secs
  Fastest:      0.0005 secs
  Average:      0.0052 secs
  Requests/sec: 9613.6124
  

Response time histogram:
  0.000 [1]     |
  0.010 [898979]        |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.020 [80945] |■■■■
  0.029 [13945] |■
  0.039 [3827]  |
  0.049 [1340]  |
  0.058 [619]   |
  0.068 [266]   |
  0.078 [75]    |
  0.087 [2]     |
  0.097 [1]     |


Latency distribution:
  10% in 0.0014 secs
  25% in 0.0023 secs
  50% in 0.0039 secs
  75% in 0.0062 secs
  90% in 0.0102 secs
  95% in 0.0139 secs
  99% in 0.0252 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0005 secs, 0.0970 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0014 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0144 secs
  resp wait:    0.0034 secs, 0.0000 secs, 0.0759 secs
  resp read:    0.0018 secs, 0.0001 secs, 0.0751 secs

Status code distribution:
  [200] 1000000 responses

And here's the current implementation that uses bytes.Buffer.

+ hey -n 1000000 http://localhost:8080

Summary:
  Total:        32.0821 secs
  Slowest:      0.0377 secs
  Fastest:      0.0002 secs
  Average:      0.0016 secs
  Requests/sec: 31170.0649
  

Response time histogram:
  0.000 [1]     |
  0.004 [947312]        |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.008 [48441] |■■
  0.011 [3670]  |
  0.015 [478]   |
  0.019 [76]    |
  0.023 [15]    |
  0.026 [5]     |
  0.030 [1]     |
  0.034 [0]     |
  0.038 [1]     |


Latency distribution:
  10% in 0.0004 secs
  25% in 0.0007 secs
  50% in 0.0012 secs
  75% in 0.0021 secs
  90% in 0.0032 secs
  95% in 0.0040 secs
  99% in 0.0063 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0002 secs, 0.0377 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0011 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0066 secs
  resp wait:    0.0014 secs, 0.0001 secs, 0.0376 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0217 secs

Status code distribution:
  [200] 1000000 responses

Here it's even more of a win for the bytes.Buffer. The resp wait time is still shorter at 0.0014 vs 0.0034, and the rps is 31,170 vs 9,613.

If there's an alternative to bufio.Writer that is faster... then great.

Any ideas @jtarchie

Nov 16 '23 17:11 a-h

I wonder if giving the buffers an initial size that's a bit bigger than default would affect the performance much?

https://pkg.go.dev/bufio#NewWriterSize

Nov 17 '23 10:11 joerdav

Yes. On my machine new(bytes.Buffer) -> 140k rps new(bufio.Writer) (which defaults to 4kb buffer) -> 130k rps bufio.NewWriterSize(nil, 32*1024) -> 146k rps bytes.NewBuffer(make([]byte(32*1024)) -> 160k rps

There are still downsides to using buffer - say you have many templates with 100kb data and a single one with 200mb of data. All of your buffers will eventually grow to 200mb size, unless you collect statistics like bytebufferpool does https://github.com/valyala/bytebufferpool/blob/master/pool.go

~~Also w.(*bytes.Buffer) assertion is not necessary since NewWriterSize already does it https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/bufio/bufio.go;l=592 And it should work fine even if there is a second buffer with smaller size since it will be bypassed https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/bufio/bufio.go;l=682~~ actually Reset(w) doesn't

Nov 19 '23 17:11 kaey

Have marked as needs decision, as it seems we have all the information now to decide if we want to implement anything for this.

Jan 30 '24 16:01 joerdav

Based on the data, I think it makes sense to use a sync buffered write pool for this. Looks like the buffered writing might result in 146k rps vs 140k rps, or around 4% improvement according to @kaey's benchmarks.

However, I'm concerned about the potential for RAM growth over time as outlined by @kaey. Having this happen to you violates the principle of least surprise.

For runtime stuff, I want to stick to the standard library as much as possible, since it benefits from the security focus of the Go project as a whole, so although the bytebufferpool sounds like a good library to use, I'd prefer to stick to the standard library for runtime if possible.

However, I don't plan to work on this in the short-ish future, because CSS media queries, improvements to LSP testing, and providing more documentation / examples would be higher up the list in priority for me at the moment.

Feb 04 '24 16:02 a-h

See comment on https://github.com/a-h/templ/discussions/781#discussioncomment-9680731 - I think this could be a way forward.

@joerdav is working on something that affects the generator (automatic imports), so I don't want to implement this until he's finished, but I think it's relatively clear how to proceed.

Jun 05 '24 17:06 a-h

templ templ copied to clipboard

runtime: Optimization Request: Improve TTFB by Streaming Rendered HTML

templ
templ copied to clipboard