cmd/compile: inlining of range funcs should be more aggressive
Consider the following program:
package pkg
import (
"iter"
)
func trivialIterator() iter.Seq[int] {
return func(yield func(int) bool) {
yield(0)
}
}
func consumer() {
for range trivialIterator() {
foo()
foo()
}
}
//go:noinline
func foo() {}
Building it with GOEXPERIMENT=newinliner with go1.24-cbdb3545ad reports:
./foo.go:8:9: can inline trivialIterator.func1 with cost 60 as: func(func(int) bool) { yield(0) }
./foo.go:7:6: can inline trivialIterator with cost 17 as: func() iter.Seq[int] { return func literal }
./foo.go:21:6: cannot inline foo: marked go:noinline
./foo.go:13:6: cannot inline consumer: function too complex: cost 175 exceeds budget 160
./foo.go:14:2: cannot inline consumer-range1: function too complex: cost 190 exceeds budget 160
./foo.go:14:27: inlining call to trivialIterator with score -23
./foo.go:8:9: can inline consumer.trivialIterator.func1 with cost 60 as: func(func(int) bool) { yield(0) }
./foo.go:14:2: inlining call to consumer.trivialIterator.func1 with score 60
./foo.go:8:14: yield does not escape
./foo.go:8:9: func literal escapes to heap:
./foo.go:8:9: flow: ~r0 = &{storage for func literal}:
./foo.go:8:9: from func literal (spill) at ./foo.go:8:9
./foo.go:8:9: from return func literal (return) at ./foo.go:8:2
./foo.go:8:9: func literal escapes to heap
./foo.go:8:14: yield does not escape
./foo.go:14:2: consumer capturing by ref: #state1 (addr=false assign=true width=8)
./foo.go:14:2: func literal does not escape
./foo.go:14:27: func literal does not escape
Note that despite the utter trivialness of the loop body (consumer-range1), it cannot be inlined.
I think that inlining of range bodies deserves a special case and that the cost of the synthetic function is unimportant. If this were a "normal" loop, the body would inherently be part of the consumer function, no matter how complex it'd be. If we know that yield is only called in a single place syntactically, then we should be able to unconditionally inline the range func and produce code no more complex than for a normal loop (excluding the state tracking for range funcs, but that's fixed overhead).
I'm sure I'm missing something.
(The bot will find many related issues, but I don't think any of them directly mention this general case.)
Also see #69015
I've experimented with a fix for this in the way pre-range-over-func days and it was effective but it needs cleanup.
The idea is that if for some reason we know that a function is only ever called from one place, there is no downside* to inlining it, no matter how big it is, we are gonna remove it's only call point allowing it to be tree shaken away, we wont have to pay the ABI cost and having everything in one body allows all the other optimizations to be more effective.
However at least in the go compiler you can't really count and see X function is only ever called once since packages are built incrementally in reversed import graph order, the exception is function expressions, thx to scoping it's impossible to grab a reference to a function expression inside an other function. So we can analyze the current function body and if it is not passed in or leaked in anyway, we know all the callsites are in the current function.**
*in practice it relies on the compiler and linker doing a good job, you can build counter examples, mostly when block layout gets lucky without inlining and unlucky with, but in my experiments it was fine. All of this also only directly work on backward inliners like go's compiler one.
**there is one edge case where a small inlinable function (f1) returns a big function expression, anyone inlining f1 will see themselves as creating the function expression and according to the rules I've described above that would mean they would inline the big function expression duplicating the function body many times. I need to write code to account for that and but it would still solve this issue.
@rsc what did I do that dissuaded @gabyhelp from finding related issues?
CC @dr2chase
I believe this was implemented in d524c954b14c861e6a442e09abd38ba074ad376d.