ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

`COPY`ing increasing amount of data from postgres metadata store as DB grows

Open vilterp opened this issue 9 months ago • 3 comments

I have a large number (~250) of writers connecting to my Postgres+S3 DuckLake instance and inserting rows.

Looking at my RDS Postgres metrics, I see an increasing amount of data being scanned:

Image

The statement stats show it mostly being a COPY statement:

Image

Is it necessary to copy this many rows out to attach and insert data? Thanks!

vilterp avatar May 30 '25 04:05 vilterp

If we create indexes for those meta tables, the scan IO should be much better.

YuweiXiao avatar May 30 '25 06:05 YuweiXiao

Thanks for the report!

This is caused by the extension not yet sending the queries to be executed in Postgres, but instead fetching the table contents and running the queries in DuckDB, see my comment here. This is not required and we plan to add support for directly running these queries in Postgres instead in the near future.

Mytherin avatar May 30 '25 07:05 Mytherin

I see, thanks!

vilterp avatar May 30 '25 13:05 vilterp