echo
echo copied to clipboard
proposal: Micro-optimization for better memory utilization
Hello,
I've noticed that there's a potential for optimizing certain structs by rearranging their fields. This optimization could notably reduce the padding added by the compiler, ensuring better memory utilization and potentially enhancing performance.
It's important to consider that alignment requirements vary between 32-bit and 64-bit architectures. With a focus on optimizing for 64-bit architecture, I believe there's significant room for improvement in this regard.
It's worth emphasizing that this optimization won't alter the program's logic; rather, it involves solely changing the location of fields within the structs.
If okay, I would like to work on this.
Sorry for the late reply. If you are have interest and time for it please go ahead.
Could you provide small summary of which structs you aim, maybe small example with benchmark. Maybe related links to (blog)posts about this kind of optimizations - so people that are not that knowledgeable could get some introductory into the matter.
If there are no performance gains it would be useful to show that and reason why we see numbers we see. This is useful less experienced coders.
I am a quite ignorant at these things, I sometimes look at micro-optimizations that other frameworks do and think how much this magic actually give them benefits. It does not mean we should not do them but to have honest conversation what the benefits are in numbers.
I would really like if these kind of things come in a form that everyone interested (even not Echo users) could learn something. Having more optimized code is good, but even better is having that and more learned people for Go ecosystem.
Absolutely, sharing knowledge is key here, and I'm fully on board with that.
So let me start with a simple exlanation of what this micro optimization is.
Here we have two important concepts:
- Word Size: This refers to the number of bytes the CPU can process at one time. On a 64-bit system, the word size is 8 bytes, meaning the CPU can handle 8 bytes of data in a single operation. Ref
- Alignment: For the CPU to process data most efficiently, the data needs to be aligned in memory according to the word size. This means that data types should ideally begin at memory addresses that are multiples of their size, up to the word size. Ref
When struct fields are not naturally aligned with these boundaries, the Go compiler will automatically insert padding
(extra space) between fields. This padding ensures that each field starts at an address that aligns with its size, optimizing memory access during runtime. And as a result when a data type is naturally aligned, the CPU fetches it in minimum read cycles.
Look at this struct:
type myStruct struct {
myBool bool // 1 byte
myFloat float64 // 8 bytes
myInt int32 // 4 bytes
}
And this is how the memory allocation for this struct will look like:
The red space is the padding added by the compiler to complete a word.
If we just reorder the fields of that struct like this:
type myStructOptimized struct {
myFloat float64 // 8 bytes
myBool bool // 1 byte
myInt int32 // 4 bytes
}
We will get an allocation like this:
And finally with just a reorder we can save 8 bytes.
Here's the code if you want to run it locally:
package main
import (
"fmt"
"unsafe"
)
type myStruct struct {
myBool bool // 1 byte
myFloat float64 // 8 bytes
myInt int32 // 4 bytes
}
type myStructOptimized struct {
myFloat float64 // 8 bytes
myBool bool // 1 byte
myInt int32 // 4 bytes
}
func main() {
a := myStruct{}
b := myStructOptimized{}
fmt.Println("Size of myStruct:", unsafe.Sizeof(a)) //24
fmt.Println("Size of myStructOptimized:", unsafe.Sizeof(b)) //16
}
References: Golang struct size and memory optimisation Alignment Structure Matters: Padding, Alignment, and Memory Optimization in Go Optimize Your Struct with Alignment Trick + Great Visualization How to Speed Up Your Struct in Golang
Benchmark
I've been attempting to fine-tune the Echo framework to incorporate new benchmarks over the past few days. However, I encountered challenges due to the complexity of duplicating code required to establish a suitable benchmark environment. As a result, I decided to forgo benchmarking directly within the Echo framework itself.
I will drop this benchmark code here so you can run it locally too.
main.go
package main
type OptimizedUserStruct struct {
Email string // 16 bytes
Balance int64 // 8 bytes
Rank int32 // 4 bytes
Age uint16 // 2 bytes
IsAdmin bool // 1 byte
IsActive bool // 1 byte
}
type UserStruct struct {
IsAdmin bool // 1 byte + 7 padding
Email string // 16 bytes
IsActive bool // 1 byte + 7 padding
Balance int64 // 8 bytes
Age uint16 // 2 bytes + 2 padding
Rank int32 // 4 bytes
}
main_test.go
package main
import (
"testing"
)
var OptimizedStructList []OptimizedUserStruct
var StructList []UserStruct
func BenchmarkUserStruct(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
tmp := UserStruct{
IsAdmin: true,
Email: "[email protected]",
IsActive: true,
Balance: 42,
Age: 42,
Rank: 42,
}
StructList = append(StructList, tmp)
}
}
func BenchmarkOptimizedUserStruct(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
tmp := OptimizedUserStruct{
Email: "[email protected]",
Balance: 42,
Rank: 42,
Age: 42,
IsAdmin: true,
IsActive: true,
}
OptimizedStructList = append(OptimizedStructList, tmp)
}
}
Result
go test -bench=. ./...
goos: linux
goarch: amd64
pkg: lab/padding
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkUserStruct-8 10656973 109.0 ns/op 270 B/op 0 allocs/op
BenchmarkOptimizedUserStruct-8 46414730 53.81 ns/op 162 B/op 0 allocs/op
PASS
ok lab/padding 4.164s
It's crucial to emphasize the title of the issue, which proposes a "micro" optimization. Therefore, we shouldn't anticipate a significant impact on performance. The effects of these types of optimizations become apparent at scale.
Thank you @behnambm !