go proposal: reflect: add Value.Caller

reflect.Call allocates a []reflect.Value slice of length nout for the number of return arguments from the function. This is fine for a one-off call of a function, but when repeatedly invoking a function, reflect.Call allocations can quickly add up.

As an example, I have a helper package that sorts arbitrary types, and if a type has a func (t T) Less(other T) bool (i.e., Name == "Less", NumIn() == 1, NumOut() == 1, In(0) == T, Out(0) == bool), the package calls that function for sorting. This is done within sort.Slice, meaning Less is called as many times as sort.Slice needs to compare. For an ordered slice of 1000 elements, each sort.Slice allocates 25735 times.

The only four allocations in this benchmark are (tip @ 2cf85b1fb8)

  105.50MB 68.28% 68.28%   105.50MB 68.28%  reflect.Value.call /home/twmb/go/go.tip/src/reflect/value.go:565
      41MB 26.53% 94.81%       41MB 26.53%  reflect.methodReceiver /home/twmb/go/go.tip/src/reflect/value.go:862
    4.50MB  2.91% 97.72%     4.50MB  2.91%  reflect.Value.call /home/twmb/go/go.tip/src/reflect/value.go:609
    1.50MB  0.97% 98.70%     1.50MB  0.97%  runtime.allocm /home/twmb/go/go.tip/src/runtime/proc.go:1866

70% of the allocations are from allocating the output argument slice. If possible, I could provide the input slice and reuse it for all Calls, which would effectively eliminate this alloc.

I looked into line 862 and into the runtime, I'm not sure how to get rid of that allocation: the code takes the address of an unsafe pointer, and that allocates; if possible it would be quite nice to get rid of this alloc as well but this would likely require deeper wiring into the runtime (the point here is to have a method pointer). Line 609 has a TODO that could eliminate the allocation. runtime/proc.go:1866 looks to be unrelated.

reflect.Value.call can be:

func (v Value) call(op string, in, out []Value) []Value {

If out is non-nil, then it is used rather than allocating a new slice. If the input out is too small, the code will panic. Essentially, line 565 turns into:

ret = out
if ret == nil {
    ret = make([]Value, nout)
}
if len(ret) < nout { // optional check
    panic("short output slice")
}

Nov 04 '21 06:11 twmb

Going forward would it make sense to write your package using generics?

Nov 04 '21 22:11 ianlancetaylor

I don't think so. This is doing deep sorting on struct fields (admittedly this is not a high priority package). If the field is a slice, I check the inner type for the method. With generics, even if I know the method has a Less function, I cannot type assert into Lesser[T] because T requires instantiation and I do not know the type.

Nov 04 '21 22:11 twmb

If we add CallInto or something like it, does reflect.Call go down to zero allocations? It would be good to know we were done with this particular issue, at least for Call.

Nov 10 '21 22:11 rsc

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

Nov 10 '21 23:11 rsc

This seems like a duplicate of #43732.

If we add CallInto or something like it, does reflect.Call go down to zero allocations?

According to my comment https://github.com/golang/go/issues/43732#issuecomment-761704581, we can get it down to zero allocations. While there is a run time improvement of 50%, it wasn't a great as I had hoped.

Nov 10 '21 23:11 dsnet

Interesting, yes I think this is a duplicate of #43732. Does the CL linked take care of methods? I don't see a change in methodReceiver, which was the remaining allocation I ran into.

I also tried removing the allocation locally -- I got a similar sort of speedup: not as much as I hoped, but the allocation drop is somewhat nice regardless.

Nov 11 '21 00:11 twmb

Does the CL linked take care of methods?

I didn't do anything special for method functions. I'll take a look later if something is needed there.

Nov 11 '21 00:11 dsnet

Indeed, this is a duplicate of #43732, but this one is in the active column, so I'll close that one.

Dec 01 '21 18:12 rsc

It sounds like people basically agree about the signature (even in #43732): it takes two []Value and returns nothing. But we need a name, and we need to figure out whether it handles variadic functions like Call or like CallSlice. If it handles them like Call, doesn't that allocate? So probably it should handle them like CallSlice. Maybe it would be CallSlices? Other names welcome but CallInto and CallWith both sound like they are missing something.

Dec 01 '21 18:12 rsc

What should we do about Call vs CallSlice? We probably don't want to add two new functions. And if we do this, are we down to zero allocations, so we know we won't need to add another variant in the future?

Dec 08 '21 18:12 rsc

The naming problem could be avoided by introducing a reflect.Caller type.

//Caller is a low-level type used to make 
//zero-allocation function calls.
type Caller struct {}

I suspect that adding this type has an additional advantage, it leaves the door open to a NewCaller/Caller constructor of the type being able to optimise repeat calls to the function it's calling.

echo := reflect.ValueOf(func(in string) string { return in }).Caller()

or

echo := reflect.NewCaller(reflect.ValueOf(func(in string) string { return in }))

then

results := make([]reflect.Value, 1)
args := []reflect.Value{
    reflect.ValueOf("hello"),
}

for i := 0; i < 100; i++ {
    echo.Call(args, results) //more efficient
}

What should we do about Call vs CallSlice?

reflect.Caller provides at least two options:

Have two methods on the reflect.Caller type, Call and CallSlice.
Have reflect.Caller.Call behave like CallSlice when it is constructed from a variadic function.

It can then be considered that reflect.Value.Call and reflect.Value.CallSlice are shorthand/abstractions around the use of the "low-level" Caller type.

And if we do this, are we down to zero allocations, so we know we won't need to add another variant in the future?

In my echo example above, it is my understanding that the string header returned from echo will escape to the heap when it is converted into a reflect.Value and placed in the results slice. Therefore 'zero-allocation' is only possible by providing pointers in the results slice for Call to write into. I think it should be the requirement of reflect.Caller's user to allocate all the results needed to call the function. Only reflect.Value.Call, reflect.Value.CallSlice and reflect.Caller's constructor should be allowed to allocate.

results := []reflect.Value{new(string)} //all results must be pre-allocated.
args := []reflect.Value{reflect.ValueOf("hello")}
for i := 0; i < 100; i++ {
    echo.Call(args, results) //zero-allocations.
}

Dec 09 '21 02:12 Splizard

Adding the ability to specify the output slice removes allocations for non-method function calls. For methods, there still is an allocation to create a method pointer. Since call immediately uses and then discards this method pointer, it would be good if this allocation could be removed, but I'm not sure how, and removing the allocation for output arguments does not remove the method pointer allocation.

Dec 10 '21 17:12 twmb

It sounds like we are having trouble getting to an API that will actually guarantee zero allocations and that is simple enough to be a reasonable API change. Does anyone see how to thread this needle?

Jan 05 '22 18:01 rsc

It still sounds like we don't have a final API to decide about, one that will guarantee zero allocations and is simple enough to present to users. Does anyone want to try to do that?

Jan 19 '22 18:01 rsc

@twmb Is this something that a reflect.Caller type could handle? If the reflect.Value is a method, allocate this method pointer and reuse it across sequential calls.

Jan 19 '22 21:01 Splizard

I think reflect.Caller is a nifty idea: the method pointer is allocated once, and function lookups are cached (and this work eliminated) across repeated invocation. I'm not sure what the code would look like but I'm :+1: on the idea.

Also, reflect.Caller could have one single method. Stealing some documentation from the current reflect.Value.Call,

// Call calls the function v with the input arguments in. For example, if len(in)
// == 3, v.Call(in) represents the Go call v(in[0], in[1], in[2]).  It returns the
// output results as Values. As in Go, each input argument must be assignable to
// the type of the function's corresponding input parameter. If v is a variadic
// function, Call creates the variadic slice parameter itself, copying in the
// corresponding values.
//
// The returned slice is reused across repeated invocations to call. If you wish
// to keep the argument slice itself, it must be copied.
func (c reflect.Caller) Call(args []reflect.Value) []reflect.Value

In this scheme, the returned argument slice can be allocated once on the first call, and reused across future invocations.

Alternatively, this type could have a second method. CallInto was questioned above, and I'm not so sure on CallWith myself (with what?), but perhaps CallWithReturns (still open to better names)?

In summary for new APIs,

func NewCaller(interface{}) reflect.Caller
func (v reflect.Value) Caller() reflect.Caller

type reflect.Caller struct { ... }

func (c reflect.Caller) Call(args []reflect.Value) []reflect.Value // reuses return slice across repeated invocations
func (c reflect.Caller) CallWithReturns(args, returns []reflect.Value)

I believe it should be possible to have zero allocation calls with this helper type.

Jan 20 '22 21:01 twmb

Is Caller just a cache for allocated things? If so, maybe it shouldn't embed the receiver? That is, maybe it should be

var caller reflect.Caller caller.Call(v, args)

because then you can do

caller.Call(otherV, otherArgs)

Does anyone want to try implementing this?

Jan 26 '22 18:01 rsc

Sounds like we are waiting on someone to implement this to understand whether it satisfies the performance goals for the proposal.

Feb 09 '22 18:02 rsc

Placed on hold. — rsc for the proposal review group

Feb 23 '22 19:02 rsc

I have taken a run at implementing this, which can be found here: https://github.com/nrwiersma/go/blob/caller/src/reflect/caller.go and it seems it is possible to get it down to zero allocs.

This is basically just a copy of Value.Call that caches a bunch of the values, and I am not sure I fully understand everything that happens here. It seems that caching the output of funcLayout reduces the time taken by roughly half (~80ns of ~160ns) and from what I gather from its description should be safe. Without caching it the call time is basically unchanged while the allocs are removed.

I chose, at least for this PoC, not to allow the user to provide their own output slice, as to remove the alloc from L270 a fair amount of type checking was needed, and it was starting to get really ugly. For now the values are created on construction and reused.

My benchmark numbers are as follows:

name           time/op
Value_Call-8    159ns ± 1%
Caller_Call-8  76.1ns ± 0%

name           alloc/op
Value_Call-8    32.0B ± 0%
Caller_Call-8   0.00B

name           allocs/op
Value_Call-8     2.00 ± 0%
Caller_Call-8    0.00

While I was at it I noticed that callMethod is also allocating on methodReceiver and takes a long time on funcLayout, but this is likely for another issue.

Nov 15 '22 17:11 nrwiersma

Thanks. Taking off hold.

Nov 15 '22 20:11 ianlancetaylor

The implemented API seems to be:

package reflect

type Caller struct { ... }
func NewCaller(v Value) *Caller 
func (c *Caller) Call(rcvr Value, in []Value) []Value

This lets NewCaller compute various things about v that can be reused across multiple calls to v. I'm not sure whether it should be a method on Value instead of a top-level function, and of course we'd want to deduplicate the code with reflect.Call (presumably Value.Call would use NewCaller), but it looks reasonable.

@twmb, @dsnet, any thoughts?

Nov 16 '22 18:11 rsc

(presumably Value.Call would use NewCaller)

NewCaller.Call is using CallSlice style input args, so this would be a change to the way that Value.Call behaves.

Nov 16 '22 18:11 nrwiersma

@nrwiersma thanks for the reminder about CallSlice. It seems like Caller should have both Call and CallSlice methods. Then both Value.Call and Value.CallSlice would be wrappers.

@twmb, @dsnet any thoughts?

Nov 30 '22 18:11 rsc

@rsc I have refactored as requested. Caller has moved into https://github.com/nrwiersma/go/blob/caller/src/reflect/value.go#L432 and implements both Call and CallSlice. It is now accessed via Value.Caller() and Value.Call and Value.CallSlice both now use Caller. Value.call has moved to Caller.call and I needed to add methodReceiverType to check the actual receiver type.

All tests are passing and my benchmarks look like this:

name                      time/op
Call-8                       121ns ± 7%
CallerCall-8                11.7ns ± 1%
CallMethod-8                 141ns ± 0%
CallerCallMethod-8          33.3ns ± 0%

name                      alloc/op
Call-8                        256B ± 0%
CallerCall-8                 0.00B
CallMethod-8                  256B ± 0%
CallerCallMethod-8           0.00B

name                      allocs/op
Call-8                        1.00 ± 0%
CallerCall-8                  0.00
CallMethod-8                  1.00 ± 0%
CallerCallMethod-8            0.00

Dec 01 '22 05:12 nrwiersma

+1 to the current API. How is this avoiding allocations on the return slice?

Dec 06 '22 16:12 twmb

It reuses the return slice. Now that you mentioned it, I realise I did not document it. I will fix that soon.

Dec 06 '22 16:12 nrwiersma

So the current API is:

type Caller struct { ... }
func (v Value) Caller() *Caller
func (c *Caller) Call(rcvr Value, in []Value) []Value 
func (c *Caller) CallSlice(rcvr Value, in []Value) []Value

and the current Value.Call and Value.CallSlice become wrappers:

func (v Value) Call(in []Value) []Value {
	var rcvr Value
	if v.flag&flagMethod != 0 {
		rcvr = v
	}
	return v.Caller().Call(rcvr, in)
}

The detail about the receiver not being pre-bound already in the result of v.Caller() seems wrong. v.Caller().Call(in) should be the same as v.Call(in) in all cases. That means the Call and CallSlice methods should not have a rcvr Value argument.

So the API I'm suggesting would change to:

type Caller struct { ... }
func (v Value) Caller() *Caller
func (c *Caller) Call(in []Value) []Value 
func (c *Caller) CallSlice(in []Value) []Value

and the wrapper would become

func (v Value) Call(in []Value) []Value {
	return v.Caller().Call(in)
}

Does that seem reasonable?

Dec 07 '22 18:12 rsc

Fair. Will make the change. It does simplify things a little.

Dec 07 '22 18:12 nrwiersma

My concern with this API is that I don't know how I'm suppose to use it.

I looked at two use-cases to validate this:

use of reflect.Value.Call in "github.com/google/go-cmp"
hypothetical use of reflect.Value.Calue in "encoding/json" for #5901
- (we no longer need reflect.Value.Call for this now that we have Go generics, but it's still worthwhile to consider as a case study)

In order for this to be useful, I'm responsible for pooling the reflect.Caller objects. How do I do that when I'm managing an unbounded number of discrete functions? For each function, I can't just have one pre-prepared reflect.Caller since reflect.Caller is not concurrent safe. I need a pool of them for each function. It does not scale to have a sync.Pool for each of these. The alternative is to only have one, but protect it with a sync.Mutex, which seems highly unfortunate.

Dec 12 '22 22:12 dsnet

go go copied to clipboard

proposal: reflect: add Value.Caller

go
go copied to clipboard