duckdb_mysql icon indicating copy to clipboard operation
duckdb_mysql copied to clipboard

MySQL -> DuckDB export causes segfault

Open benjamingr opened this issue 1 year ago • 1 comments

What happens?

When running MySQL -> DuckDB import on a simple setup (testcontainers) a segfault occurs.

To Reproduce

When setting up test containers with duckdb to evaluate MySQL export as such (full fixture setup code):

def populate_mysql(mysql: MySqlContainer):
    connection = pymysql.connect(
        host=mysql.get_container_host_ip(),
        port=int(mysql.get_exposed_port(3306)),
        user=mysql.username,
        password=mysql.password,
        database=f"test",
        autocommit=True
    )

    with connection.cursor() as cursor:
        for table_idx in range(TABLE_COUNT):
            # Create users table with columns "id", "email", "password_hash", "credit_card_number" with fake PII
            cursor.execute(
                f"CREATE TABLE users{table_idx} (id INT AUTO_INCREMENT PRIMARY KEY, email TEXT, password_hash TEXT, credit_card_number TEXT);")
            # Insert some fake data
            for i in range(100):
                cursor.execute(
                    f"INSERT INTO users{table_idx} (email, password_hash, credit_card_number) VALUES ('user{i}@example.com', 'hash{i}', '1111-2222-3333-4444');")
            # Create orders table with columns "id", "user_id", "order_date", "order_total", "address" with fake PII
            cursor.execute(
                f"CREATE TABLE orders{table_idx} (id INT AUTO_INCREMENT PRIMARY KEY, user_id INT, order_date TIMESTAMP, order_total DECIMAL(10,2), address TEXT);")
            # Insert some fake transactions
            for i in range(100):
                cursor.execute(
                    f"INSERT INTO orders{table_idx} (user_id, order_date, order_total, address) VALUES ({(i + 10) % 10}, '2022-01-01 00:00:00', 100.00, '1234 Main St, Springfield, IL 62701');")
    connection.close()

And then connecting to the testcontainer and performing "export from" on it (similar PostgreSQL code works):

INSTALL mysql;
LOAD mysql;

ATTACH 'database=test user=test password=test host=localhost port=60057' AS d (TYPE MYSQL, READ_ONLY);

--- Executing this line causes a segfault, crashes DBeaver (or Python when running Python code)
COPY FROM DATABASE d TO memory;

OS:

macOS 14.3 (23D56)

DuckDB Version:

latest ( "1.1.1" )

DuckDB Client:

DBeaver (also with DBeaver)

Hardware:

M3 max

Full Name:

Benjamin Gruenbaum

Affiliation:

Eon.io

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • [X] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [X] Yes, I have

benjamingr avatar Oct 06 '24 10:10 benjamingr

(Just for reference, running regular queries against the MySQL code works, e.g. SELECT * from d.users0 works and returns the correct results)

benjamingr avatar Oct 06 '24 10:10 benjamingr