ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

Reattach a ducklake with MySQL catalog

Open adrien-ferrara opened this issue 6 months ago • 12 comments

Hello,

Maybe I missed some information somewhere.

I set up a DuckLake with a MySQL catalog, everything seemed fine, but when I wan't to reattach it I run into those errors:

  • Without specifying DATA_PATH:

ATTACH 'ducklake:mysql:db=ducklake_catalog host=192.168.1.100 user=root port=3306 password=xxx' AS myDucklake ; Invalid Input Error: Attempting to create a new ducklake instance but data_path is not set - set the DATA_PATH parameter to the desired location of the data files

  • Specifiying DATA_PATH

ATTACH 'ducklake:mysql:db=ducklake_catalog host=192.168.1.100 user=root port=3306 password=xxx' AS myDucklake (DATA_PATH 'data_files/'); IO Error: Failed to initialize DuckLake:Failed to run query "CREATE TABLEducklake_catalog.ducklake_metadata(keyTEXT NOT NULL,value TEXT NOT NULL);": Table 'ducklake_metadata' already exists

adrien-ferrara avatar May 28 '25 14:05 adrien-ferrara

Are you sure that the database is not attached already?

You could try DETACH or SHOW databases; to find out.

arjenpdevries avatar May 28 '25 15:05 arjenpdevries

Are you sure that the database is not attached already?

You could try DETACH or SHOW databases; to find out.

Yes pretty much, also tried to close and open a new client before attaching the existing ducklake.

adrien-ferrara avatar May 28 '25 15:05 adrien-ferrara

Make sure you don't have another duckdb session open in another terminal. I noticed that attachments are shared across all open :memory instances, which can definitely lead to some unexpected results. For example I was testing mysql and postgres catalogs in two separate terminals but used the same AS my_ducklake name for both attachments and they kept stepping on each other.

ryanschneider avatar May 28 '25 20:05 ryanschneider

Hello, I have the same problem with DuckLake. Test spec:

  1. DuckDB CLI on Win11,
  2. DB Engine: MySQL(MariaDB) - 10.3.39-MariaDB-0ubuntu0.20.04.2 Ubuntu 20.04 Database Collation: utf8mb4_general_ci and Engine Type: InnoDB.
  3. Data files local on Win11

Image

kjkit avatar May 28 '25 22:05 kjkit

So after seeing this issue: https://github.com/duckdb/ducklake/issues/112

It seems that adding the METADATA_SCHEMA option solves the problem:

ATTACH 'ducklake:mysql:db=ducklake_catalog host=192.168.1.100 user=root port=3306 password=xxx' AS ducklake_catalog (DATA_PATH 'data_files/' , METADATA_SCHEMA 'ducklake_catalog');

I'm just wondering if this is normal and the relevant way to do this ?

adrien-ferrara avatar Jun 02 '25 06:06 adrien-ferrara

It seems that the schema_name is not well resolve when mysql or postgresql is used in the following query : https://github.com/duckdb/ducklake/blob/main/src/storage/ducklake_initializer.cpp#L76

Indeed if the option METADATA_SCHEMA is specified it works, If it's the expected behaviour, the documentation would be updated....

fb64 avatar Jun 06 '25 06:06 fb64

using METADATA_SCHEMA 'ducklake_catalog' doesn't fix the ATTACH a second time issue for me I just now get

Failed to create DuckLake connection: IO Error: Failed to initialize DuckLake:Failed to run query "CREATE SCHEMA `ducklake_catalog`": Can't create database 'ducklake_catalog'; database exists

instead the of table already exists error

Any other ideas for a workaround ?

cameronbraid avatar Jul 09 '25 17:07 cameronbraid

iled to initialize DuckLake:Failed to run query "CREATE SCHEMA ducklake_catalog": Can't create database 'ducklake_catalog'; database exists

I did some tries and it's working well on my side (DuckDB v1.3.2, MySQL 8.0.42)

  1. Start a fresh cli and Init DuckLake with MySQL:

ATTACH 'ducklake:mysql:db=ducklake host=localhost' AS my_ducklake (DATA_PATH 'ducklake_data/')

  1. Stop Current cli and start a new one and attach existing ducklake :

ATTACH 'ducklake:mysql:db=ducklake host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake')

fb64 avatar Jul 15 '25 15:07 fb64

Using METADATA_SCHEMA works, but how to specify METADATA_SCHEMA when directing attaching ducklake using cli?

duckdb ducklake:...

guitcastro avatar Jul 16 '25 12:07 guitcastro

I'm using https://crates.io/crates/duckdb with an in memory duckdb. My first attach to mysql works, the second doesn't

Failed to create DuckLake connection: IO Error: Failed to initialize DuckLake:Failed to run query "CREATE SCHEMA `ducklake_catalog`": Can't create database 'ducklake_catalog'; database exists

I even used detach and drop of the connection before attempting a re-attach.

cameronbraid avatar Jul 16 '25 12:07 cameronbraid

Using METADATA_SCHEMA works, but how to specify METADATA_SCHEMA when directing attaching ducklake using cli?

duckdb ducklake:...

For CLI, you can use the cmd option: duckdb -cmd "ATTACH 'ducklake:mysql:db=ducklake host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake')"

fb64 avatar Jul 16 '25 15:07 fb64

Is there a way to specify this when working with SQLMesh ? I tried adding metadata_schema key in the ducklake catalog but it just throws error:

Error: Invalid 'duckdb' connection config:
  Invalid field 'catalogs.ducklake.str':
    Input should be a valid string
  Invalid field 'catalogs.ducklake.DuckDBAttachOptions.metadata_schema':
    Extra inputs are not permitted

atTheShikhar avatar Jul 25 '25 05:07 atTheShikhar