ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

ducklake fails to update table stats on insert into ducklake table on mysql

Open ededovic opened this issue 5 months ago • 6 comments

Here is a min viable example problem. ducklake is set up on mysql database. Based on OpenAI, it is related to ducklake update of stats:

Invalid Input Error: Attempting to fetch from an unsuccessful query result
Error: TransactionContext Error: Failed to commit: Failed to commit DuckLake transaction: Failed to update stats information in DuckLake: Unsupported operator type HASH_JOIN in UPDATE statement - only simple deletes (e.g. UPDATE FROM tbl WHERE x=y) are supported in the MySQL connector

Example when this happens:

Image

chatgpt insight links this to issue #210. There is not much detail in that issue. I'm already using 1.3.2 and can not tell which version of ducklake, but I think it is 0.2 based on some function names that did not exit in the prior version.

extension_name	extension_version
ducklake	9cc2d90

ChatGPT response: Image

ededovic avatar Jul 14 '25 21:07 ededovic

The ChatGPT response is garbage and should be rmoved. ducklake.set_option does not seem to exist. IMHO no solution here so far. Also still occurring in 1.3.1/0.2

alexanderhupfer avatar Aug 02 '25 14:08 alexanderhupfer

Thanks for opening this issue in the DuckLake issue tracker! To resolve this issue, our team needs a reproducible example. This includes:

  • A source code snippet which reproduces the issue.
  • The snippet should be self-contained, i.e., it should contain all imports and should use relative paths instead of hard coded paths (please avoid /Users/JohnDoe/...).
  • A lot of issues can be reproduced with plain SQL code executed in the DuckDB command line client. If you can provide such an example, it greatly simplifies the reproduction process and likely results in a faster fix.
  • If the script needs additional data, please share the data as a CSV, JSON, or Parquet file. Unfortunately, we cannot fix issues that can only be reproduced with a confidential data set. Support contracts allow sharing confidential data with the core DuckDB team under NDA.

For more detailed guidelines on how to create reproducible examples, please visit Stack Overflow's “Minimal, Reproducible Example” page.

duckdblabs-bot avatar Aug 04 '25 09:08 duckdblabs-bot

Here's a minimal reproducible example that demonstrates the issue:

#!/usr/bin/env python3
"""
Minimal reproducible example for DuckLake MySQL UPDATE FROM issue.
Demonstrates the "Failed to update stats information in DuckLake" error.
"""

import duckdb
import os

# MySQL configuration
MYSQL_CONFIG = {
    'host': 'localhost',
    'user': 'root',
    'password': 'your_password',
    'database': 'ducklake_metadata',
    'port': '3306'
}

LOCAL_DATA_PATH = './ducklake_data'

# Create local data directory
os.makedirs(LOCAL_DATA_PATH, exist_ok=True)

# Create DuckDB connection
conn = duckdb.connect(':memory:')

# Install and load required extensions
conn.execute("INSTALL mysql")
conn.execute("LOAD mysql")
conn.execute("INSTALL ducklake")
conn.execute("LOAD ducklake")

# Attach DuckLake with MySQL metadata store
# Note: METADATA_SCHEMA is required to avoid initialization errors
mysql_connection = f"db={MYSQL_CONFIG['database']} host={MYSQL_CONFIG['host']} port={MYSQL_CONFIG['port']} user={MYSQL_CONFIG['user']} password={MYSQL_CONFIG['password']}"
attach_query = f"ATTACH 'ducklake:mysql:{mysql_connection}' AS ducklake_db (DATA_PATH '{os.path.abspath(LOCAL_DATA_PATH)}', METADATA_SCHEMA '{MYSQL_CONFIG['database']}')"

print("Attaching DuckLake with MySQL metadata store...")
conn.execute(attach_query)

# Create table (or use existing)
try:
    conn.execute("""
    CREATE TABLE ducklake_db.test_table (
        id INTEGER,
        name VARCHAR,
        value DOUBLE,
        created_date DATE
    )
    """)
except Exception as e:
    if "already exists" in str(e):
        print("Table already exists, continuing...")
        conn.execute("DELETE FROM ducklake_db.test_table")

print("Inserting data...")
# This INSERT will fail during commit when DuckLake tries to update statistics
conn.execute("""
INSERT INTO ducklake_db.test_table VALUES 
    (1, 'Alice', 100.5, '2024-01-01'),
    (2, 'Bob', 200.0, '2024-01-02'),
    (3, 'Charlie', NULL, '2024-01-03'),
    (4, 'David', 150.75, '2024-01-04'),
    (5, NULL, 300.25, '2024-01-05')
""")

# The error occurs here during the automatic commit

Error Output

TransactionContext Error: Failed to commit: Failed to commit DuckLake transaction: 
Failed to update stats information in DuckLake: MySQL Update not supported - 
Expected the child of an update to be a projection

Environment

  • DuckDB version: 1.3.2
  • MySQL version: 9.3.0
  • OS: macOS (Darwin)

Notes

  • The error only occurs when statistics need to be updated (typically on INSERT operations)
  • First-time table creation and inserts may work, but subsequent operations fail
  • Using METADATA_SCHEMA option is required to avoid initialization conflicts with existing metadata

alexanderhupfer avatar Aug 04 '25 20:08 alexanderhupfer

The minimal example was already provided as an image, here it as text:

ATTACH 'ducklake:mysql:database=issues_lake' AS ilake
       (DATA_PATH 'duckdb_data_lake');  

USE ilake;

create or replace table ilake.main.insert_issues  (filename varchar, dt date);
insert into ilake.main.insert_issues values ('hello.csv','2027-07-10');

-- it fails on this line
insert into ilake.main.insert_issues 
  values ('hello2.csv', '2025-07-29');

Invalid Input Error: Attempting to fetch from an unsuccessful query result
Error: TransactionContext Error: Failed to commit: Failed to commit DuckLake transaction: Failed to update stats information in DuckLake: Unsupported operator type HASH_JOIN in UPDATE statement - only simple deletes (e.g. UPDATE FROM tbl WHERE x=y) are supported in the MySQL connector

duckdb version 1.3.2, ducklake version 0.2, mysql 8.

ededovic avatar Aug 04 '25 20:08 ededovic

@guillesd - please let me know if the sample was reproducible or something more is needed. Would be highly interested in getting this resolved.

alexanderhupfer avatar Aug 17 '25 20:08 alexanderhupfer

Hi @alexanderhupfer @ededovic unfortunately we have seen that there are several limitations regarding the DuckDB MySQL connector that are limiting for the DuckDB Ducklake extension. We have placed a warning in the website that for now MySQL is not recommended as a catalog database for Ducklake. https://ducklake.select/docs/stable/duckdb/usage/choosing_a_catalog_database#mysql

I recommend you use Postgres if that is available for you! It has much better support and we plan to add more features to it (like Data Inlining) soon.

Sorry for the inconvenience and thanks for filing the issue!

guillesd avatar Aug 19 '25 13:08 guillesd