go
go copied to clipboard
proposal: io: CopierTo and CopierFrom interfaces
Proposal Details
The io.Copy
and io.CopyBuffer
functions will use WriteTo
when the source supports it and ReadFrom
when the destination supports it.
There are cases when a type can efficiently support WriteTo
or ReadFrom
, but only some of the time. For example, CL 472475 added an WriteTo
method to *os.File
which performs a more efficient copy on Linux systems when the destination is a Unix or TCP socket. In all other cases, it falls back to io.Copy
.
The io.Copy
fallback means that io.CopyBuffer
with an *os.File
source will no longer use the provided buffer, because the buffer is not plumbed through to WriteTo
. (It also resulted in https://go.dev/issue/66988, in which the fallback resulted in the fast path in *net.TCPConn.ReadFrom not being taken.)
There have been previous proposals to fix this problem:
- #21904: Rejected.
- #16474: Discussion ongoing, but this wouldn't address the
io.Copy
case.
Those issues also contain some links to various other instances of this problem turning up, but I think the *os.File.WriteTo
case is a sufficient example of a problem that needs solving, since io.CopyBuffer
no longer works as expected on files.
I propose that we add two new interfaces to the io
package:
package io
// CopierTo is the interface that wraps the CopyTo method.
//
// CopyTo writes data to w until there is no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
//
// When len(buf) > 0, CopyTo may use the provided buffer as temporary space.
//
// CopyTo may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy to w.
//
// The Copy function uses CopierTo if available.
type CopierTo interface {
CopyTo(w Writer, buf []byte) (n int64, err error)
}
// CopierFrom is the interface that wraps the CopyFrom method.
//
// CopyFrom reads data from r until EOF or error.
// The return value n is the number of bytes read.
// Any error except EOF encountered during the read is also returned.
//
// When len(buf) > 0, CopyFrom may use the provided buffer as temporary space.
//
// CopyFrom may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy from r.
//
// The Copy function uses CopierFrom if available.
type CopierFrom interface {
CopyFrom(r Reader, buf []byte) (n int64, err error)
}
The CopierTo
and CopierFrom
interfaces supersede WriterTo
and ReaderFrom
when available. They provide a way to plumb the copy buffer down through the copy operation, and permit an implementation to implement a fast path for some subset of sources or destinations while deferring to io.Copy
for other cases.
We will update io.Copy
and io.CopyBuffer
to prefer CopierTo
and CopierFrom
when available.
We will update *os.File
and *net.TCPConn
(and possibly other types) to add CopyTo
and CopyFrom
methods.
Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?
Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?
Yes.
Maybe I'm missing the point but what precisely is the issue with solving this how you opt out of any unwanted type unerasure by forcing a barrier struct (as already mentioned in #16474)? If you are using io.CopyBuffer, chances are you specially prepared for the buffer, so we are talking about a deliberate, pin-pointed optimization, not implicit system-wide optimizations which io.WriterTo do help with.
io.CopyBuffer is not alone in exhibiting behaviour not expressible in the type system which one needs to be wary about: compress/zlib.NewReader may discard data if the reader cannot unerase to an exact one and bufio.(*Reader).WriteTo can even call ReadFrom (although undocumented). If you want to prohibit a function from utilizing its fast paths, hide your own capabilities.
The motivation for this proposal is to fix io.Copy
when used with common standard library types such as *os.File
and *net.TCPConn
.
It's perhaps not immediately obvious that io.Copy
is broken. Let's look at a concrete example from the net/http
package (https://go.googlesource.com/go/+/refs/tags/go1.22.2/src/net/http/transfer.go#412):
func (t *transferWriter) doBodyCopy(dst io.Writer, src io.Reader) (n int64, err error) {
buf := getCopyBuf()
defer putCopyBuf(buf)
n, err = io.CopyBuffer(dst, src, buf)
if err != nil && err != io.EOF {
t.bodyReadError = err
}
return
}
This function copies from a user-provided io.Reader
to an io.Writer
. It doesn't know anything about the source and destination, although the Writer
has usually been created by the net/http
package. It uses io.CopyBuffer
and does buffer pooling to minimize allocations. The copy operation takes advantage of sendfile
when available.
Let's consider the case where we're on macOS, copying from an *os.File
created by os.Pipe
, and to a *net.TCPConn
.
In this case:
- doBodyCopy allocates a copy buffer from its pool.
- io.CopyBuffer notices that the source supports
WriteTo
and calls it. -
*os.File.WriteTo
doesn't have any special handling on macOS, so it wraps the source in a type that removes theWriteTo
method and callsio.Copy
. -
io.Copy
notices that the destination supportsReadFrom
and calls it. -
*net.TCPConn.ReadFrom
calls sendfile, which fails, because macOS doesn't support sendfile from a pipe. -
*net.TCPConn.ReadFrom
wraps the destination in a type that removes theReadFrom
method and callsio.Copy
.f -
io.Copy
allocates a buffer and does the copy.
Note that there are three io.Copy
operations in the call stack. We lost the user-allocated buffer provided to CopyBuffer
after the first one. It is entirely not obvious to the original caller (doBodyCopy
) that the buffer provided to CopyBuffer
won't be used; doBodyCopy
was, after all, operating on a plain io.Reader
and io.Writer
.
We could change doBodyCopy
to mask any ReadFrom
or WriteTo
methods on the reader and writer, at the cost of disabling the sendfile optimization when we do want it. Requiring every caller of io.CopyBuffer
everywhere to do this would be unfortunate.
This is a mess.
At a high level, I have two goals. When using common types from the standard library (files, network sockets):
-
io.Copy
(and related functions) should not recursively call itself; and -
io.CopyBuffer
should use the buffer provided to it.
Arguably, the recursive calls to Copy
aren't necessarily a problem, but they add so much confusion to the call stack that it has become very difficult to understand what's actually going on in a copy operation. The CopyBuffer
buffer getting lost is, I think, at this point a straight up bug: As of Go 1.22, CopyBuffer
with an *os.File
as the source will never use the provided buffer. The fact that this changed between Go 1.21 and 1.22 seems to me to be a plain regression.
The Go compatibility promise closes off most avenues for fixing this:
- We document
io.Copy
as callingReadFrom
/WriteTo
when available. - Changing
io.CopyBuffer
to not callReadFrom
/WriteTo
seems likely to cause regressions in code that expects the current behavior. - We can't remove the
WriteTo
method from*os.File
. - We can't remove the
ReadFrom
method from*net.TCPConn
. - We can't start returning
errors.ErrUnsupported
from existingReadFrom
orWriteTo
methods.
(Of these options, the last one seems like the most feasible: If could return ErrUnsupported
from *os.File.WriteTo
and *net.TCPConn.ReadFrom
when the fast sendfile path is not available, then we can get by without any new APIs. That was #21904, which was rejected. Perhaps it should be reopened?)
The root of the problem is that WriteTo
and ReadFrom
aren't the right interface between io.Copy
and an io.Reader
/io.Writer
:
- Support for fast-path copies may be conditional on the pairing of (source, destination), and the
WriteTo
/ReadFrom
are necessarily associated with only one or the other. There needs to be a way for a type to support fast copies, with a fallback to the usual copy path. - If a type needs a buffer to support its copy, it has
WriteTo
/ReadFrom
have no access to the buffer provided toio.CopyBuffer
.
The proposed CopierTo
/CopierFrom
aim to fix io.Copy
on os
and net
package types by providing an interface that does include the necessary features.
Adding yet more magic methods to dig us out of the hole caused by the existing magic methods does seem somewhat odd. It's quite possible that if we were designing io.Copy
from the start, there would be a better way to do this. But I don't see another way out of this given the current constraints, I believe this proposal does address the problem, and I think that the addition of *os.File.WriteTo
in Go 1.22 has put us at a point where surprising io.Copy
behavior is common enough that we need to do something to address the situation.
We could constrain this by using an internal package, and by looking for methods that return types defined in that internal package. That would mean that only types in the standard library would implement the new interfaces. On the one hand that would prevent other packages from taking advantage of these new methods. On the other hand it would constrain the additional complexity.
For example
package iofd
type FD uintptr
type SysFDer interface {
SysFD() FD
CopyBetween(dst, src FD, buf []byte) (int64, bool, error)
}
Then in package io
srcfd, srcOK := src.(iofd.SysFDer)
dstfd, dstOK := dst.(iofd.SysFDer)
if srcOK && dstOK {
len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
if handled {
return len, err
}
}
We could constrain this to be internal-only, but I don't think it's worth it. We'd still have the complexity of the new methods, we'd just not be allowing anyone else to take advantage of the benefits.
I think the complexity of the new methods is significantly less if nobody else can implement them. In particular we can design the new methods so that they work on the pair of source and destination, which is what we need. One of the problems with the current methods is that they only work on the source or on the destination, which is how we've gotten into the position of needing to use both, and therefore needing to turn off both, and therefore increasing complexity.
#69097 is another example of things going wrong in the current tangle of ReaderFroms/WriterTos. I'm pretty sure this also broke with the addition of *os.File.WriteTo in 1.22.
In particular we can design the new methods so that they work on the pair of source and destination, which is what we need.
I don't follow this. Any method we add will need to be called on one of the source or destination, with the other side of the copy as a parameter. I don't see the reason to pass srcfd as a parameter in the example; srcfd is already the receiver, so we're just passing it to the method twice:
len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
Also, requiring that both source and destination implement SysFDer is limiting. Currently, the Linux File.ReadFrom checks to see if the source is an io.LimitedReader and extracts the inner reader when available ( https://go.googlesource.com/go/+/refs/tags/go1.23.0/src/os/zero_copy_linux.go#73). Either we'd have to make LimitedReader implement SysFDer or drop the ability to splice from a size-limited source.
The problem with ReaderFrom/WriterTo isn't that they operate on only the source or destination, it's that they have no access to the copy buffer and no way to fall back if the fast path isn't available. I don't think fixing those problems adds complexity; instead it removes it.
You're right, there is no reason to make CopyBetween
a method of SysFDer
. It should just be a top level internal/poll.CopyBetween
function that copies between two file descriptors. We can handle LimitedReader
(and CopyN
) by also passing a maximum length.
I think there is a problem with ReaderFrom
and WriterTo
that you aren't mentioning: in order to use them to invoke sendfile
or copy_file_range
, whoever implements the methods needs to dig through the argument to find out what it can do. That's the step I'm trying to avoid by SysFDer
. I agree that this does not help with the io.CopyBuffer
problem.