pgx icon indicating copy to clipboard operation
pgx copied to clipboard

Many copies in large object implementation

Open mitar opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe.

I have not measured any of this, but while working on #1875, I noticed that there is a lot of copying/allocation in the code:

  • Read creates a new res []byte slice and first scans into it, and then copies into destination p.
  • Scan ignores any allocated buffer in res (that is OK, because Read does not allocate any) and allocates buffer.
  • Allocation in Scan does not use any pool of buffers.
  • Read uses QueryRow instead of Query which I think providing preallocated buffers?

Describe the solution you'd like

I think there could be some potential optimizations done here. I think Scan could allocate only if src does not already have enough space. This is currently done in PreallocBytes's ScanBytes, but I do not understand why it is not done also in Scan:

func (scanPlanBinaryBytesToBytes) Scan(src []byte, dst any) error {
	dstBuf := dst.(*[]byte)
	if src == nil {
		*dstBuf = nil
		return nil
	}

	if len(src) <= len(*dstBuf) {
		*dstBuf = (*dstBuf)[:len(src)]
	} else {
		*dstBuf = make([]byte, len(src))
	}
	copy(*dstBuf, src)
	return nil
}

That would then allow us to do in Read:

		res := p[nTotal:]
		err := o.tx.QueryRow(o.ctx, "select loread($1, $2)", o.fd, expected).Scan(&res)
		copy(p[nTotal:], res)

So instead of having res be unallocated slice, we could reuse p. In that case if p is large enough (and in this case it would always be large enough because we computed expected value accordingly) copy has an optimization to not really copy (by my understanding).

Describe alternatives you've considered Additional context

I think a similar thing could be achieved by wrapping p with pgtype.PreallocBytes inside Read, but I just do not get why Scan itself does not do a similar thing to pgtype.PreallocBytes's ScanBytes?

Additional context

With large objects I would like to minimize number of copying. Ideally, provided p buffer to Read would be directly used to store data obtained over the connection without additional (large) buffer for the connection data. I am not sure if that is achievable at all? The suggestions above still do at least one copy, from connection buffer into destination buffer. Is it possible to somehow improve that? My understanding is that it is hard to prevent one copy in Reader standard interface? That maybe large objects implementation should also implement WriterTo interface (and ReaderTo if we are already at it) to optimize that? Do you think we should add them? Also, I am not sure if pgx really provides a way to do zero-copy from connection into a Writer?

mitar avatar Jan 14 '24 11:01 mitar

There's less chance of misuse when making a allocating a new buffer by default. And while that may or may not have been a wrong decision it would be far too risky to existing code to make that change now.

Using pbtype.PreallocBytes inside of Read does seem like a good idea though.

jackc avatar Jan 15 '24 14:01 jackc

Using pbtype.PreallocBytes inside of Read does seem like a good idea though.

I can try to do that. But I am curious, you think it would not be possible to somehow get the reader to get bytes directly from the connection? So that as connection is being read it is directly written out to the buffer given to the Read? My understanding is that there is a binary stream already so there is no conversion needed? Once large object data starts on the connection it could be just directly read of into user provided buffer?

mitar avatar Jan 19 '24 13:01 mitar

But I am curious, you think it would not be possible to somehow get the reader to get bytes directly from the connection?

Not easily. Each message is read and parsed in its entirety and the entire response is a single message. I'm also doubtful it would make a significant difference. There are already unavoidable layers and copies in reading bytes from a connection - especially with TLS. I've experimented a bit in the past with streaming the reading of individual messages and I did not find enough performance improvement to be worth the additional complication.

jackc avatar Jan 20 '24 15:01 jackc