go proposal: io: CopierTo and CopierFrom interfaces

Proposal Details

The io.Copy and io.CopyBuffer functions will use WriteTo when the source supports it and ReadFrom when the destination supports it.

There are cases when a type can efficiently support WriteTo or ReadFrom, but only some of the time. For example, CL 472475 added an WriteTo method to *os.File which performs a more efficient copy on Linux systems when the destination is a Unix or TCP socket. In all other cases, it falls back to io.Copy.

The io.Copy fallback means that io.CopyBuffer with an *os.File source will no longer use the provided buffer, because the buffer is not plumbed through to WriteTo. (It also resulted in https://go.dev/issue/66988, in which the fallback resulted in the fast path in *net.TCPConn.ReadFrom not being taken.)

There have been previous proposals to fix this problem:

#21904: Rejected.
#16474: Discussion ongoing, but this wouldn't address the io.Copy case.

Those issues also contain some links to various other instances of this problem turning up, but I think the *os.File.WriteTo case is a sufficient example of a problem that needs solving, since io.CopyBuffer no longer works as expected on files.

I propose that we add two new interfaces to the io package:

package io

// CopierTo is the interface that wraps the CopyTo method.
//
// CopyTo writes data to w until there is no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
//
// When len(buf) > 0, CopyTo may use the provided buffer as temporary space.
//
// CopyTo may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy to w.
//
// The Copy function uses CopierTo if available.
type CopierTo interface {
	CopyTo(w Writer, buf []byte) (n int64, err error)
}

// CopierFrom is the interface that wraps the CopyFrom method.
//
// CopyFrom reads data from r until EOF or error.
// The return value n is the number of bytes read.
// Any error except EOF encountered during the read is also returned.
//
// When len(buf) > 0, CopyFrom may use the provided buffer as temporary space.
//
// CopyFrom may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy from r.
//
// The Copy function uses CopierFrom if available.
type CopierFrom interface {
	CopyFrom(r Reader, buf []byte) (n int64, err error)
}

The CopierTo and CopierFrom interfaces supersede WriterTo and ReaderFrom when available. They provide a way to plumb the copy buffer down through the copy operation, and permit an implementation to implement a fast path for some subset of sources or destinations while deferring to io.Copy for other cases.

We will update io.Copy and io.CopyBuffer to prefer CopierTo and CopierFrom when available.

We will update *os.File and *net.TCPConn (and possibly other types) to add CopyTo and CopyFrom methods.

Apr 26 '24 20:04 neild

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

Apr 26 '24 23:04 jfrech

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

Yes.

Apr 26 '24 23:04 neild

Maybe I'm missing the point but what precisely is the issue with solving this how you opt out of any unwanted type unerasure by forcing a barrier struct (as already mentioned in #16474)? If you are using io.CopyBuffer, chances are you specially prepared for the buffer, so we are talking about a deliberate, pin-pointed optimization, not implicit system-wide optimizations which io.WriterTo do help with.

io.CopyBuffer is not alone in exhibiting behaviour not expressible in the type system which one needs to be wary about: compress/zlib.NewReader may discard data if the reader cannot unerase to an exact one and bufio.(*Reader).WriteTo can even call ReadFrom (although undocumented). If you want to prohibit a function from utilizing its fast paths, hide your own capabilities.

Apr 28 '24 15:04 jfrech

The motivation for this proposal is to fix io.Copy when used with common standard library types such as *os.File and *net.TCPConn.

It's perhaps not immediately obvious that io.Copy is broken. Let's look at a concrete example from the net/http package (https://go.googlesource.com/go/+/refs/tags/go1.22.2/src/net/http/transfer.go#412):

func (t *transferWriter) doBodyCopy(dst io.Writer, src io.Reader) (n int64, err error) {
	buf := getCopyBuf()
	defer putCopyBuf(buf)
	n, err = io.CopyBuffer(dst, src, buf)
	if err != nil && err != io.EOF {
		t.bodyReadError = err
	}
	return
}

This function copies from a user-provided io.Reader to an io.Writer. It doesn't know anything about the source and destination, although the Writer has usually been created by the net/http package. It uses io.CopyBuffer and does buffer pooling to minimize allocations. The copy operation takes advantage of sendfile when available.

Let's consider the case where we're on macOS, copying from an *os.File created by os.Pipe, and to a *net.TCPConn.

In this case:

doBodyCopy allocates a copy buffer from its pool.
io.CopyBuffer notices that the source supports WriteTo and calls it.
*os.File.WriteTo doesn't have any special handling on macOS, so it wraps the source in a type that removes the WriteTo method and calls io.Copy.
io.Copy notices that the destination supports ReadFrom and calls it.
*net.TCPConn.ReadFrom calls sendfile, which fails, because macOS doesn't support sendfile from a pipe.
*net.TCPConn.ReadFrom wraps the destination in a type that removes the ReadFrom method and calls io.Copy.f
io.Copy allocates a buffer and does the copy.

Note that there are three io.Copy operations in the call stack. We lost the user-allocated buffer provided to CopyBuffer after the first one. It is entirely not obvious to the original caller (doBodyCopy) that the buffer provided to CopyBuffer won't be used; doBodyCopy was, after all, operating on a plain io.Reader and io.Writer.

We could change doBodyCopy to mask any ReadFrom or WriteTo methods on the reader and writer, at the cost of disabling the sendfile optimization when we do want it. Requiring every caller of io.CopyBuffer everywhere to do this would be unfortunate.

This is a mess.

At a high level, I have two goals. When using common types from the standard library (files, network sockets):

io.Copy (and related functions) should not recursively call itself; and
io.CopyBuffer should use the buffer provided to it.

Arguably, the recursive calls to Copy aren't necessarily a problem, but they add so much confusion to the call stack that it has become very difficult to understand what's actually going on in a copy operation. The CopyBuffer buffer getting lost is, I think, at this point a straight up bug: As of Go 1.22, CopyBuffer with an *os.File as the source will never use the provided buffer. The fact that this changed between Go 1.21 and 1.22 seems to me to be a plain regression.

The Go compatibility promise closes off most avenues for fixing this:

We document io.Copy as calling ReadFrom/WriteTo when available.
Changing io.CopyBuffer to not call ReadFrom/WriteTo seems likely to cause regressions in code that expects the current behavior.
We can't remove the WriteTo method from *os.File.
We can't remove the ReadFrom method from *net.TCPConn.
We can't start returning errors.ErrUnsupported from existing ReadFrom or WriteTo methods.

(Of these options, the last one seems like the most feasible: If could return ErrUnsupported from *os.File.WriteTo and *net.TCPConn.ReadFrom when the fast sendfile path is not available, then we can get by without any new APIs. That was #21904, which was rejected. Perhaps it should be reopened?)

The root of the problem is that WriteTo and ReadFrom aren't the right interface between io.Copy and an io.Reader/io.Writer:

Support for fast-path copies may be conditional on the pairing of (source, destination), and the WriteTo/ReadFrom are necessarily associated with only one or the other. There needs to be a way for a type to support fast copies, with a fallback to the usual copy path.
If a type needs a buffer to support its copy, it has WriteTo/ReadFrom have no access to the buffer provided to io.CopyBuffer.

The proposed CopierTo/CopierFrom aim to fix io.Copy on os and net package types by providing an interface that does include the necessary features.

Adding yet more magic methods to dig us out of the hole caused by the existing magic methods does seem somewhat odd. It's quite possible that if we were designing io.Copy from the start, there would be a better way to do this. But I don't see another way out of this given the current constraints, I believe this proposal does address the problem, and I think that the addition of *os.File.WriteTo in Go 1.22 has put us at a point where surprising io.Copy behavior is common enough that we need to do something to address the situation.

Apr 29 '24 19:04 neild

We could constrain this by using an internal package, and by looking for methods that return types defined in that internal package. That would mean that only types in the standard library would implement the new interfaces. On the one hand that would prevent other packages from taking advantage of these new methods. On the other hand it would constrain the additional complexity.

For example

package iofd

type FD uintptr

type SysFDer interface {
    SysFD() FD
    CopyBetween(dst, src FD, buf []byte) (int64, bool, error)
}

Then in package io

    srcfd, srcOK := src.(iofd.SysFDer)
    dstfd, dstOK := dst.(iofd.SysFDer)
    if srcOK && dstOK {
        len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
        if handled {
            return len, err
        }
    }

Apr 29 '24 22:04 ianlancetaylor

We could constrain this to be internal-only, but I don't think it's worth it. We'd still have the complexity of the new methods, we'd just not be allowing anyone else to take advantage of the benefits.

Apr 30 '24 00:04 neild

I think the complexity of the new methods is significantly less if nobody else can implement them. In particular we can design the new methods so that they work on the pair of source and destination, which is what we need. One of the problems with the current methods is that they only work on the source or on the destination, which is how we've gotten into the position of needing to use both, and therefore needing to turn off both, and therefore increasing complexity.

Apr 30 '24 01:04 ianlancetaylor

#69097 is another example of things going wrong in the current tangle of ReaderFroms/WriterTos. I'm pretty sure this also broke with the addition of *os.File.WriteTo in 1.22.

Aug 28 '24 22:08 neild

In particular we can design the new methods so that they work on the pair of source and destination, which is what we need.

I don't follow this. Any method we add will need to be called on one of the source or destination, with the other side of the copy as a parameter. I don't see the reason to pass srcfd as a parameter in the example; srcfd is already the receiver, so we're just passing it to the method twice:

len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)

Also, requiring that both source and destination implement SysFDer is limiting. Currently, the Linux File.ReadFrom checks to see if the source is an io.LimitedReader and extracts the inner reader when available ( https://go.googlesource.com/go/+/refs/tags/go1.23.0/src/os/zero_copy_linux.go#73). Either we'd have to make LimitedReader implement SysFDer or drop the ability to splice from a size-limited source.

The problem with ReaderFrom/WriterTo isn't that they operate on only the source or destination, it's that they have no access to the copy buffer and no way to fall back if the fast path isn't available. I don't think fixing those problems adds complexity; instead it removes it.

Aug 28 '24 23:08 neild

You're right, there is no reason to make CopyBetween a method of SysFDer. It should just be a top level internal/poll.CopyBetween function that copies between two file descriptors. We can handle LimitedReader (and CopyN) by also passing a maximum length.

I think there is a problem with ReaderFrom and WriterTo that you aren't mentioning: in order to use them to invoke sendfile or copy_file_range, whoever implements the methods needs to dig through the argument to find out what it can do. That's the step I'm trying to avoid by SysFDer. I agree that this does not help with the io.CopyBuffer problem.

Aug 28 '24 23:08 ianlancetaylor

go go copied to clipboard

proposal: io: CopierTo and CopierFrom interfaces

Proposal Details

go
go copied to clipboard