postgres_scanner
postgres_scanner copied to clipboard
Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet
What happens?
Memory usage keeps increasing as the process progresses. It seems that the results from PostgreSQL queries are not being released or cleaned up properly.
To Reproduce
I am using DuckDB to export tables from my local PostgreSQL database to Parquet files. However, I am noticing a significant memory increase during the process, suggesting a potential memory leak. Below is the code I am using:
import duckdb
con = duckdb.connect(database='my_database.duckdb')
con.install_extension("postgres_scanner")
con.load_extension("postgres_scanner")
con.sql("SET memory_limit = '20GB';")
con.sql("SET threads TO 3;")
con.sql("SET enable_progress_bar = true;")
con.sql("""
ATTACH 'dbname=** user=** host=127.0.0.1 password=**' AS db (TYPE POSTGRES, READ_ONLY);
""")
all_tables = con.sql("SHOW ALL tables;").fetchdf()
tables = all_tables['name'].to_list()
for table in tables:
con.execute(f"COPY db.public.{table} TO '{table}.parquet' (FORMAT PARQUET);")
print(f"Table {table} copied to {table}.parquet")
con.close()
OS:
Ubuntu, x86_64
DuckDB Version:
1.1.3
DuckDB Client:
Python
Hardware:
VM: 32 GB RAM, 8 Core
Full Name:
Vineet Verma
Affiliation:
Harvard University
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
- [X] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [X] Yes, I have