pgcat PgCat Gets Confused With Multiple Unnamed Prepared Statements

Describe the bug There's a bug in PgCat around parsing and binding of multiple unnamed prepared statements.

PgCat saves prepared statements in a HashMap<String, (Arc<Parse>, u64)>, where the key is the name of the prepared statement. It also parses and buffers all extended queries until it receives a Sync, at which point the queries are sent down to the server.

Because PgCat renames and dedupes the prepared statements it sends to the server, it first must make sure the server has all the prepared statements it needs. This is problematic in the case of unnamed prepared statements.

Take the following extended query:

Parse("", "CREATE TABLE my_table")
Bind("")
Execute
Parse("", "SELECT * FROM my_table")
Bind("")
Execute
Sync

When PgCat receives this query it renames and buffers the first parse into its hashmap: { "": ("PGCAT_0", "CREATE TABLE my_table") }.

On the second parse it renames the parse, buffers it, and overwrites its map: { "": ("PGCAT_1", "SELECT * FROM my_table") }.

Now we get the sync and start working through the buffer.

On the first parse: we check to see if the server already has PGCAT_0. In our case it does not and buffers a Parse("PGCAT_0", "CREATE TABLE my_table") to send down.

On the first bind: we need to make sure the prepared statement we're using is already on the server. We look up "" in our hashmap and find ("PGCAT_1", "SELECT * FROM my_table"). PGCAT_1 is not on the server, so we first do a Parse("PGCAT_1", "SELECT * FROM my_table").

But wait! We just sent down the second prepared statement when we were still working on the first. Now the server will fail to parse PGCAT_1 (the table doesn't exist yet!), and the whole extended query will fail.

Now if the second query wasn't dependent on the first (or just didn't fail to parse in general), we should still get correct results. This is because we still buffer the correct Bind("PGCAT_1") and Bind("PGCAT_1") messages to send down to the server.

To Reproduce Send an extended protocol query with the following:

Parse("", "CREATE TABLE my_table")
Bind("")
Execute
Parse("", "SELECT * FROM my_table")
Bind("")
Execute
Sync

Read the response and see that it fails.

Expected behavior PgCat handles unnamed prepared statements correctly and parses / binds them in the correct order.

Screenshots N/A

Desktop (please complete the following information):

OS: aarch64 / AL2
Browser: N/A
Version: Postgres 16 / PgCat 1.2.0

Additional context Add any other context about the problem here.

Nov 21 '24 19:11 tneely

Hi Taylor, this pattern of using multiple Parse statements before a single sync is not something I've seen in the wild. Is this something your ORM is doing? or are you specifically sending? are you able to send syncs in between the parses?

Dec 12 '24 19:12 zainkabani

I've seen it used by JDBC specifically in the context of Liquibase update. This is with autocommit on:

PostgreSQL
    Type: Parse
    Length: 13
    Statement: 
    Query: BEGIN
    Parameters: 0
PostgreSQL
    Type: Bind
    Length: 12
    Portal: 
    Statement: 
    Parameter formats: 0
    Parameter values: 0
    Result formats: 0
PostgreSQL
    Type: Execute
    Length: 9
    Portal: 
    Returns: all rows
PostgreSQL
    Type: Parse
    Length: 4542
    Statement: 
    Query [truncated]: select string_agg(word, ',') from pg_catalog.pg_get_keywords() where word <> ALL ('{a,abs,absolute,action,ada,add,admin,after,all,allocate,alter,always,and,any,are,array,as,asc,asensitive,assertion,assignment,asymmetric
    Parameters: 0
PostgreSQL
    Type: Bind
    Length: 12
    Portal: 
    Statement: 
    Parameter formats: 0
    Parameter values: 0
    Result formats: 0
PostgreSQL
    Type: Describe
    Length: 6
    Portal: 
PostgreSQL
    Type: Execute
    Length: 9
    Portal: 
    Returns: all rows
PostgreSQL
    Type: Sync
    Length: 4

You can see it does an unnamed parse for BEGIN followed by the select.

Dec 13 '24 05:12 tneely