go icon indicating copy to clipboard operation
go copied to clipboard

proposal: io: CopyBuffer should avoid ReadFrom/WriteTo

Open dsnet opened this issue 9 years ago • 26 comments

Currently, io.CopyBuffer uses WriteTo if the src supports it and ReadFrom if the dst supports it, presumably to avoid an allocation.

The problem is that there exist certain io.ReaderFrom that are not efficient 100% of the time. For example, consider the following implementation of ReadFrom:

type myWriter struct {
    w io.Writer
}

func (mw *myWriter) ReadFrom(src io.Reader) (int64, error) {
    if r, ok := mw.w.(io.ReaderFrom); ok {
        return r.ReadFrom(src)
    }
    return io.Copy(struct{ io.Writer }{mw}, src)
}

This is unfortunate as myWriter satisfies the io.ReaderFrom interface, but it's implementation is no better than io.Copy (always incurring an allocation) in some circumstances.

In the case of io.CopyBuffer, I would argue that the user's expectation is that the buffer passed in is always used. It has the detriment of an extra copy, but I believe the cost of that is much lower than the allocation that is already saved by the user providing the buffer. It main benefit is that it avoids any shenanigans that occur because of inefficient implementations of io.ReaderFrom and io.WriterTo.

Alternative is to do anonymous struct value hack:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

But, this hack is not intuitive to most people.

Filed on behalf of @jkji

dsnet avatar Jul 22 '16 20:07 dsnet

Why would someone implement io.ReaderFrom or io.WriterTo if they are inefficient? Seems really backwards to me.

I see what you mean with the buf. It doesn't make much sense to pass in a buffer when io.CopyBuffer will just end up calling io.ReaderFrom. That is unintuitive.

I don't think we can change this because of the Go 1 compatibility anyway.

nhooyr avatar Jul 26 '16 12:07 nhooyr

ReaderFrom does not have to be horribly inefficient, but it can be inefficient enough for the CopyBuffer caller to care. In my case, this was severely reducing the file transfer rate but it might be totally fine for some other application. Someone probably added it for a different use case, without realizing the side effects on CopyBuffer. Conceptually, it seems that ReaderFrom should have a buffer argument and ignore it if it is not needed or nil. Efficient buffer-less ReaderFrom implementation sounds like a rare beast to me.

romanl-g avatar Jul 26 '16 15:07 romanl-g

Well we can't really change this now. Perhaps in Go 2?

nhooyr avatar Jul 26 '16 19:07 nhooyr

@nhooyr, how do you see this as a Go 1 compatibility issue? Whether io.CopyBuffer chooses to use ReadFrom or WriteTo or not does not alter the correctness of the function. It's primarily a question of performance.

Part of the problem is that io.ReaderFrom and io.WriterTo doesn't document anything about needing to be "efficient" (however that is defined), so there are many implementations of them that are sub-par compared to the case where io.CopyBuffer just used the user provided buffer only.

dsnet avatar Jul 26 '16 19:07 dsnet

He said that io.ReaderFrom should have a buffer argument that should be ignored if it is not needed or nil. That cannot happen without breaking Go 1 Compatibility. Perhaps it should be better documented but the whole point of io.ReaderFrom is to not need a buffer. To write it directly like say sendfile(2).

Furthermore, a lot of code relies on theio.Copy functions using the io.ReaderFrom or io.WriterTo interfaces to optimize. Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we break this documented behaviour that go programs could rely on.

I definitely agree though, I find it unintuitive that io.CopyBuffer doesn't always use the passed buffer, but because io.CopyBuffer using io.ReaderFrom or io.WriterTo is documented behavior, it breaks the Go 1 compatibility promise.

nhooyr avatar Jul 26 '16 20:07 nhooyr

CopyBuffer is identical to Copy except that it stages through the provided buffer (if one is required) rather than allocating a temporary one.

tv42 avatar Jul 26 '16 20:07 tv42

Its documented behavior that io.CopyBuffer behaves identically to io.Copy except it stages through the provided buffer rather than allocating a temporary one. By changing io.CopyBuffer we may break this documented behaviour that go programs could rely on.

Fair argument.

dsnet avatar Jul 26 '16 20:07 dsnet

@nhooyr, thanks for the clarification. Backwards compatibility argument makes sense, that's why I said "conceptually" ;)

romanl-g avatar Jul 26 '16 21:07 romanl-g

Just noting here that strings.Reader.WriteTo is another example of an implementation of io.WriterTo that is not efficient in all circumstances.

It relies on io.WriteString, which only calls WriteString if the underlying writer supports it. Otherwise, it causes a (potentially large) allocation to convert the string to a []byte.

Related to #13848, which is intended to help alleviate the issue.

dsnet avatar Aug 02 '16 18:08 dsnet

@dsnet Is this an actual bottleneck in something that can't be fixed without stdlib changes?

tv42 avatar Aug 02 '16 20:08 tv42

My original post contained the hack to get around this issue:

io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)

It's just not an intuitive hack to most people. Secondly, the need for it doesn't become obvious until a pprof indicates that io.CopyBuffer is spending significant amounts of time and allocations.

dsnet avatar Aug 02 '16 20:08 dsnet

@dsnet Was it ever an observed bottleneck in a real, otherwise well-written, system?

tv42 avatar Aug 02 '16 20:08 tv42

Yes, #13848 was filed when I observed a large string was unnecessarily being copied. This issue was filed on behalf of someone who noticed performance issues in an internal system.

dsnet avatar Aug 02 '16 20:08 dsnet

Haha, I just got bitten by this. In tlsmuxd I was copying between two TCP connections with io.CopyBuffer. I expected it to use my buffer from my bufferPool but it turns out that *net.TCPConn has the ReadFrom method for sendfile. However, since it was reading from a net.Conn sendfile did not apply and it fell back to io.Copy. Consequently, my buffer was unused and there was a buffer allocation in io.Copy.

Luckily, I read *net.TCPConn's source one day so I realized my mistake but I've retracted my original view that it a compat issue because I think that this is extremely subtle, unintuitive and I cannot imagine anyone who passes in a buffer to io.CopyBuffer but does not expect it to be used. I understand that it was documented to behave just like io.Copy, but why would you pass in a buffer if you aren't sure it will be used? It just doesn't make any sense.

nhooyr avatar Oct 11 '16 08:10 nhooyr

Not performance related, but http://golang.org/cl/31174 is another case where the behavior of io.CopyBuffer was unexpected. The test relied upon copy being done in specific chunk sizes.

dsnet avatar Oct 17 '16 23:10 dsnet

Here's a different way to look at this issue, which generalizes to magic-method APIs in general:

For a magic-method API to work with wrapper types, the method's implementation must be able to indicate that the method is not actually supported.

io.ReaderFrom violates that principle: there is no standardized way for ReadFrom to return an error that means “fall back to Write instead”.

You could imagine retrofitting one, such as io.ErrNotSupported, but it would likely break callers on a massive scale. More likely, you'd need to rename the method to something like MaybeReadFrom so that callees can determine whether to return ErrNotSupported or press ahead with a less-efficient implementation.

bcmills avatar Jan 02 '18 17:01 bcmills

the method's implementation must be able to indicate that the method is not actually supported.

This is one of the driving usages of #21904.

Part of the problem with "not implemented" is that it's obvious for cases like io.Seeker, where you either support it or you don't. It's not so obvious what you're supposed to do, when you do implement it and your implementation happens to be less efficient than if your caller had just used Write instead.

dsnet avatar Jan 02 '18 18:01 dsnet

@bcmills I don't think this works in general. In the case discussed here, it sometimes makes sense for io.CopyBuffer to call the ReadFrom method, and sometimes it doesn't. It makes sense to use ReadFrom if ReadFrom does not require a buffer at all, as for example for (*net.TCPConn).ReadFrom on systems that support sendfile. It does not make sense for io.CopyBuffer to call ReadFrom if ReadFrom uses an internal buffer. But the fact that ReadFrom uses an internal buffer doesn't mean that it is not supported; it may be a completely appropriate use in some cases, just not for io.CopyBuffer with a larger buffer. So using a "not actually supported" mechanism would require specifying "not actually supported for this specific use" which is an N-x-N problem.

More generally, if you know the types involved, you will call the methods directly. If you don't know the types involved, there is no obvious general approach to know whether to use a magic method or not, short of actual measurement.

We wind up in this situation because there is existing code that uses io.Copy, and we can't or don't want to change that code. But we have a type that has a fast copy mechanism, and we want the existing code to use our fast mechanism. So we invent a magic method, put it on the type, and change io.Copy to use that magic method. In effect we are using a magic method as a communication mechanism across code that doesn't know about it.

It seems that the best way to use this technique effectively is for each magic method to mean exactly one thing. In this case we are using it for two things: io.Copy to copy with a generic buffer, and io.CopyBuffer to copy with a specific buffer. They should not both use the same magic method. With this approach a function like io.CopyBuffer would document clearly the set of magic methods that it supports. If multiple functions support the same magic method, we wind up with an N-x-N crossbar: trying to use the magic method for multiple different purposes, when we have to know which purpose is intended to know whether it makes sense.

On the other hand, if we have a magic method that is specific to io.CopyBuffer, then the code winds up having spooky action at a distance: methods are defined solely so that they can be detected at a completely different, seemingly unrelated, part of the code.

ianlancetaylor avatar Jan 16 '18 22:01 ianlancetaylor

I'm beginning to think that this issue calls into question the value of CopyBuffer entirely. If callers want an unconditional buffer, they can wrap the Reader in a bufio.Reader of appropriate size, which implements io.WriterTo and thus always wins the magic-method contest in io.Copy.

io.CopyBuffer only adds value if it is conditional, but the fact that it is conditional is what makes it so subtle to use.

bcmills avatar Mar 02 '18 06:03 bcmills

I want to note that in 1.15 this issue is going to apply to os.File, which acquired a ReadFrom method in https://golang.org/cl/229101.

ianlancetaylor avatar Jun 19 '20 05:06 ianlancetaylor

Change https://golang.org/cl/238864 mentions this issue: doc/go1.15: mention consequence of os.File.ReadFrom

gopherbot avatar Jun 19 '20 05:06 gopherbot

Another report: #56742.

ianlancetaylor avatar Nov 16 '22 00:11 ianlancetaylor

A related proposal: #58331.

ianlancetaylor avatar Feb 06 '23 00:02 ianlancetaylor

Change https://go.dev/cl/475575 mentions this issue: io: optimize copyBuffer to make use of the user-provided buffer for fallbacks

gopherbot avatar Mar 11 '23 08:03 gopherbot

#66988 is related: Adding *os.File.WriteTo with a fast path for sendfile/splice and a fallback to io.Copy results in us missing the fast path in *net.TCPConn.WriteTo.

neild avatar Apr 26 '24 17:04 neild

Another related proposal: #67074

neild avatar Apr 26 '24 20:04 neild