pgx icon indicating copy to clipboard operation
pgx copied to clipboard

Use hash as key for statement cache

Open Webbmekanikern opened this issue 1 year ago • 1 comments

Currently, the LRU cache of prepared statements in pgx.Conn is using the query string as cache key. This makes it tricky in our project where a query builder is reusing a byte buffer and sending query strings to pgx directly from the allocated buffer.

I have two suggestions, and I'm happy to contribute with both.

1. Use a hash of the query as key for the statement cache

Instead of using the query string directly, hash it (just like stmtcache.StatementName) and use the hash as the key, efficiently keeping no reference to the original string. A query might also be rather long, so this should decrease the memory footprint.

2. Switch hashing algorithm to xxhash

For both stmtcache.StatementName and cache key, switch from sha256 to xxhash. This should:

  • reduce allocations (not measured yet, but looks like e.g. stmtcache.StatementName does three allocations, which in this case would be reduced to one allocation)
  • reduce memory footprint (when going from a 24-byte slice to an 8-byte integer)
  • increase the performance (as xxhash is mush faster than sha256)

Alternatively: Add support for metadata on a pgx.Conn

If you don't agree with the suggestions above, it would at least be nice to have support for some arbitrary metadata on the pgx.Conn (either any or unsafe.Pointer) so that we can roll our own statement cache, without the need for synchronisation that an outside map would require.

Webbmekanikern avatar Mar 25 '24 09:03 Webbmekanikern

Currently, the LRU cache of prepared statements in pgx.Conn is using the query string as cache key. This makes it tricky in our project where a query builder is reusing a byte buffer and sending query strings to pgx directly from the allocated buffer.

I'm not sure how this would make a difference. A string will get allocated and the data copied from the []byte regardless of what pgx does later.

Instead of using the query string directly, hash it (just like stmtcache.StatementName) and use the hash as the key, efficiently keeping no reference to the original string. A query might also be rather long, so this should decrease the memory footprint.

I don't think this will reduce memory usage. The statement cache still uses the normal pgx/pgconn prepared statement system and that keeps a reference to the original SQL. See https://pkg.go.dev/github.com/jackc/pgx/[email protected]/pgconn#StatementDescription.

  1. Switch hashing algorithm to xxhash

This would involve adding an external dependency. There would need to be a very significant performance increase to justify that.

Alternatively: Add support for metadata on a pgx.Conn

I think we can do this. See the https://github.com/jackc/pgx/issues/1896 for the current proposal.

jackc avatar Apr 14 '24 01:04 jackc

Alternatively: Add support for metadata on a pgx.Conn

This has just been added in 6f0deff0156a7ffcd557eac1011ebb5ce73739d3.

jackc avatar May 09 '24 20:05 jackc