postgres_scanner icon indicating copy to clipboard operation
postgres_scanner copied to clipboard

Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet

Open vineetver opened this issue 10 months ago • 0 comments

What happens?

Memory usage keeps increasing as the process progresses. It seems that the results from PostgreSQL queries are not being released or cleaned up properly.

To Reproduce

I am using DuckDB to export tables from my local PostgreSQL database to Parquet files. However, I am noticing a significant memory increase during the process, suggesting a potential memory leak. Below is the code I am using:

import duckdb

con = duckdb.connect(database='my_database.duckdb')

con.install_extension("postgres_scanner")
con.load_extension("postgres_scanner")
con.sql("SET memory_limit = '20GB';")
con.sql("SET threads TO 3;")
con.sql("SET enable_progress_bar = true;")
con.sql("""
    ATTACH 'dbname=** user=** host=127.0.0.1 password=**' AS db (TYPE POSTGRES, READ_ONLY);
""")

all_tables = con.sql("SHOW ALL tables;").fetchdf()
tables = all_tables['name'].to_list()

for table in tables:
    con.execute(f"COPY db.public.{table} TO '{table}.parquet' (FORMAT PARQUET);")
    print(f"Table {table} copied to {table}.parquet")

con.close()

OS:

Ubuntu, x86_64

DuckDB Version:

1.1.3

DuckDB Client:

Python

Hardware:

VM: 32 GB RAM, 8 Core

Full Name:

Vineet Verma

Affiliation:

Harvard University

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot easily share my data sets due to their large size

Did you include all code required to reproduce the issue?

  • [X] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [X] Yes, I have

vineetver avatar Dec 06 '24 23:12 vineetver