doltgresql icon indicating copy to clipboard operation
doltgresql copied to clipboard

SQLAlchemy unable to fetch type information

Open Xe138 opened this issue 7 months ago • 12 comments

Issue

When using SQLAlchemy with Doltgres, the connection fails with the following error:

psycopg.errors.InternalError_: could not determine OID for placeholder v1: unsupported type: EXPRESSION

This error occurs during SQLAlchemy's dialect initialization phase when it tries to fetch type information for PostgreSQL types.

Test Script

The following script was used to test the issue:

#!/usr/bin/env python
import logging
import sys
import psycopg
from psycopg import sql

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(levelname)s %(name)s %(message)s",
    stream=sys.stdout
)
logger = logging.getLogger(__name__)

# Connection parameters
DB_PARAMS = {
    "user": "postgres",
    "password": "password",
    "host": "localhost",
    "port": "5433",
    "database": "postgres"
}

def test_parameter_binding():
    conn_string = f"host={DB_PARAMS['host']} port={DB_PARAMS['port']} dbname={DB_PARAMS['database']} user={DB_PARAMS['user']} password={DB_PARAMS['password']}"
    
    try:
        with psycopg.connect(conn_string) as conn:
            logger.info("Connected to Doltgres database")
            
            # Test 1: Simple query with no parameters
            logger.info("Test 1: Simple query with no parameters")
            with conn.cursor() as cur:
                try:
                    cur.execute("SELECT 1 as test")
                    result = cur.fetchone()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 2: Query with positional parameters
            logger.info("Test 2: Query with positional parameters")
            with conn.cursor() as cur:
                try:
                    cur.execute("SELECT %s as test", (1,))
                    result = cur.fetchone()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 3: Query with named parameters
            logger.info("Test 3: Query with named parameters")
            with conn.cursor() as cur:
                try:
                    cur.execute("SELECT %(value)s as test", {"value": 1})
                    result = cur.fetchone()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 4: The exact problematic query from SQLAlchemy
            logger.info("Test 4: The exact problematic query from SQLAlchemy")
            with conn.cursor() as cur:
                try:
                    query = """
                    SELECT ns.nspname, t.typname, t.oid, t.typarray
                    FROM pg_type t
                    JOIN pg_namespace ns ON t.typnamespace = ns.oid
                    WHERE t.typname = %(name)s
                    """
                    cur.execute(query, {"name": "hstore"})
                    result = cur.fetchall()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
                    logger.error(f"Error type: {type(e)}")
                    logger.error(f"Error details: {e.args}")
            
            # Test 5: Same query with direct string interpolation (not recommended for production)
            logger.info("Test 5: Same query with direct string interpolation")
            with conn.cursor() as cur:
                try:
                    query = """
                    SELECT ns.nspname, t.typname, t.oid, t.typarray
                    FROM pg_type t
                    JOIN pg_namespace ns ON t.typnamespace = ns.oid
                    WHERE t.typname = 'hstore'
                    """
                    cur.execute(query)
                    result = cur.fetchall()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 6: Using psycopg's sql module for safe composition
            logger.info("Test 6: Using psycopg's sql module for safe composition")
            with conn.cursor() as cur:
                try:
                    query = sql.SQL("""
                    SELECT ns.nspname, t.typname, t.oid, t.typarray
                    FROM pg_type t
                    JOIN pg_namespace ns ON t.typnamespace = ns.oid
                    WHERE t.typname = {}
                    """).format(sql.Literal("hstore"))
                    cur.execute(query)
                    result = cur.fetchall()
                    logger.info(f"Result: {result}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 7: Query to check if the pg_type table exists
            logger.info("Test 7: Check if pg_type table exists")
            with conn.cursor() as cur:
                try:
                    cur.execute("""
                    SELECT EXISTS (
                        SELECT FROM information_schema.tables 
                        WHERE table_schema = 'pg_catalog'
                        AND table_name = 'pg_type'
                    )
                    """)
                    result = cur.fetchone()
                    logger.info(f"pg_type table exists: {result[0]}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 8: Try to list all tables in pg_catalog
            logger.info("Test 8: List tables in pg_catalog")
            with conn.cursor() as cur:
                try:
                    cur.execute("""
                    SELECT table_name 
                    FROM information_schema.tables 
                    WHERE table_schema = 'pg_catalog'
                    """)
                    tables = [row[0] for row in cur.fetchall()]
                    logger.info(f"Tables in pg_catalog: {tables}")
                except Exception as e:
                    logger.error(f"Error: {e}")
            
            # Test 9: Try to get database version
            logger.info("Test 9: Get database version")
            with conn.cursor() as cur:
                try:
                    cur.execute("SELECT version()")
                    version = cur.fetchone()[0]
                    logger.info(f"Database version: {version}")
                except Exception as e:
                    logger.error(f"Error: {e}")
        
        return True
    except Exception as e:
        logger.error(f"Connection failed: {e}", exc_info=True)
        return False

if __name__ == "__main__":
    logger.info("Starting Doltgres parameter binding tests")
    success = test_parameter_binding()
    if success:
        logger.info("Tests completed (some may have failed, check logs)")
    else:
        logger.error("Failed to connect to database")
    sys.exit(0 if success else 1)

Test Results

The issue seems to be related to parameter binding in SQL queries:

  1. Simple queries without parameters work fine:

    SELECT 1 as test
    
  2. Queries with positional parameters fail:

    SELECT %s as test
    

    Error: unsupported type: EXPRESSION

  3. Queries with named parameters fail:

    SELECT %(value)s as test
    

    Error: unsupported type: EXPRESSION

  4. The specific query that SQLAlchemy uses during initialization (with parameters) returns empty results but doesn't error:

    SELECT ns.nspname, t.typname, t.oid, t.typarray
    FROM pg_type t
    JOIN pg_namespace ns ON t.typnamespace = ns.oid
    WHERE t.typname = %(name)s
    
  5. The same query with direct string interpolation works and returns empty results:

    SELECT ns.nspname, t.typname, t.oid, t.typarray
    FROM pg_type t
    JOIN pg_namespace ns ON t.typnamespace = ns.oid
    WHERE t.typname = 'hstore'
    

Technical Details

The issue occurs in the following call stack:

  1. SQLAlchemy initializes the PostgreSQL dialect
  2. It calls dialect.initialize(c) in sqlalchemy/engine/create.py
  3. This calls info = self._type_info_fetch(connection, "hstore") in sqlalchemy/dialects/postgresql/psycopg.py
  4. Which calls return TypeInfo.fetch(connection.connection.driver_connection, name)
  5. In psycopg/_typeinfo.py, it executes cur.execute(cls._get_info_query(conn), {"name": name})
  6. This fails with the error: could not determine OID for placeholder v1: unsupported type: EXPRESSION

Additional Information

  • Doltgres version: PostgreSQL 15.5 (as reported by SELECT version())
  • SQLAlchemy version: Latest with Python 3.12
  • psycopg version: Latest with Python 3.12
  • The pg_type table exists in the database (confirmed via information_schema.tables)
  • Direct SQL queries without parameter binding work correctly

Xe138 avatar May 09 '25 02:05 Xe138

Hi @Xe138,

Thanks for the detailed bug report, this is great.

I have a fix working for your repro, but I want to do some more testing of the rest of sqlalchemy for more incompatibilities before I call this done so you don't immediately run into another issue. I'll get back to you early next week with additional updates.

zachmu avatar May 10 '25 01:05 zachmu

Hey @zachmu - thanks for the quick response on this. Any update on the rest of the sqlalchemy issues? Anything I can do to help troubleshoot?

Xe138 avatar May 25 '25 22:05 Xe138

Thanks for checking in!

Getting SQL alchemy working required implementing postgres only aggregate functions like array_agg, which was relatively complicated to get working an our SQL execution engine. That's working now, but I don't know what's left. I'll do a release on Tuesday regardless and let you know what the next sticking point is.

zachmu avatar May 25 '25 23:05 zachmu

Update on this: we got farther in our sql alchemy demo app, but the tool also requires ORDER BY support in the array_agg function, which we didn't implement on the first pass. Working on that next.

zachmu avatar May 29 '25 00:05 zachmu

Another update: we're solved several issues, such as need array_agg to work with an ORDER BY clause. Now we're tackling a couple more incompatibilities in the pg_catalog table -- right now we're not exposing system tables like dolt_log to those tables, which breaks compatibility with the tool. We'll keep providing updates as we continue getting this working.

zachmu avatar May 30 '25 22:05 zachmu

@zachmu I appreciate the updates!

Xe138 avatar May 30 '25 23:05 Xe138

Another update: we think we have addressed the problems with dolt_ system tables in pg_catalog, which were causing a lot of compatibility issues. Now we're working on implementing support for functions that return sets (e.g. generate_subscripts), which is the next blocker. We'll give another update when that work is complete, probably early next week.

zachmu avatar Jun 20 '25 20:06 zachmu

Try this now. We think we got SQLAlchemy to work with Doltgres.

timsehn avatar Jul 17 '25 23:07 timsehn

I'm still working on getting everything implemented, but I'm running into an issue retrieving the commit log using sqlalchemy reflection. I'm running Doltgres version 0.51.0

The dolt-sqlalchemy demo code does the following:

def print_commit_log(engine):
    # Examine a dolt system table, dolt_log, using reflection
    metadata_obj = MetaData()
    print("Commit Log:")

    dolt_log = Table("dolt_log", metadata_obj, autoload_with=engine)
    stmt = select(dolt_log.c.commit_hash,
                  dolt_log.c.committer,
                  dolt_log.c.message
                  ).order_by(dolt_log.c.date.desc())

    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            commit_hash = row[0]
            author      = row[1]
            message     = row[2]
            print("\t" + commit_hash + ": " + message + " by " + author)

When running this, the operation fails with a Postgres catalog error:

(psycopg.errors.InternalError_) invalid input syntax for type int2: "\x00\x00" (errno 1105) (sqlstate HY000)

Let me know if I should open this as a separate issue.

Xe138 avatar Aug 05 '25 21:08 Xe138

Hi @Xe138, sorry for the slow response here. I just tested the doltgres-sqlalchemy-getting-started project with Doltgres 0.51.1 and was able to run the demo script without any errors.

I noticed in your last comment you linked to the Dolt SQLAlchemy demo project, and not the Doltgres SQLAlchemy demo project. Can you confirm that you're using https://github.com/dolthub/doltgres-sqlalchemy-getting-started and still having a problem?

fulghum avatar Aug 21 '25 19:08 fulghum

Hi @Xe138, sorry for the slow response here. I just tested the doltgres-sqlalchemy-getting-started project with Doltgres 0.51.1 and was able to run the demo script without any errors.

I noticed in your last comment you linked to the Dolt SQLAlchemy demo project, and not the Doltgres SQLAlchemy demo project. Can you confirm that you're using https://github.com/dolthub/doltgres-sqlalchemy-getting-started and still having a problem?

These links are not accessible to me (github 404 page). Is that project publicly accessible? The documentation page does not list sqlalchemy yet: https://docs.doltgres.com/reference/supported-clients.

Xe138 avatar Sep 07 '25 00:09 Xe138

Sorry about that, the repo wasn't public. Can you try now?

zachmu avatar Sep 09 '25 21:09 zachmu