duckdb_engine
duckdb_engine copied to clipboard
[Bug]: Potential Thread Safety Issue With DuckDBIdentifierPreparer
What happened?
When instantiating a bunch of memory engines in different threads, I can somewhat reliably get crashes/deadlocks on Windows when parallelism goes past 5 threads or so. It seems to be the global cursor object being used in the identifier preparer (in that if I remove that, everything is fine up to arbitrarily high thread counts (100+)).
I'm not sure if this is really a core use case, given that the Engines are probably created once typically, but wanted to flag just in case.
class DuckDBIdentifierPreparer(PGIdentifierPreparer):
def __init__(self, dialect: "Dialect", **kwargs: Any) -> None:
super().__init__(dialect, **kwargs)
self.reserved_words.update(
{
keyword_name
for (keyword_name,) in duckdb.cursor()
.execute(
"select keyword_name from duckdb_keywords() where keyword_category == 'reserved'"
)
.fetchall()
}
)
Repo is just with a query to create an engine and do a select 1 and return.
def main():
"""
Main function to initialize and run the threads.
"""
num_threads = 100
threads = []
print(f"Starting execution with {num_threads} threads...")
for i in range(1, num_threads + 1):
# Create a Thread object, target is the function, args are the arguments for the function
thread = threading.Thread(target=run_duckdb_query, args=(i,))
threads.append(thread)
thread.start()
print(f"Main: Thread {i} started.")
# Wait for all threads to complete
for thread in threads:
thread.join()
print("\nAll threads have completed their execution.")
if __name__ == "__main__":
main()```
### DuckDB Engine Version
duckdb-engine-0.17.0
### DuckDB Version
1.3.2
### SQLAlchemy Version
_No response_
### Relevant log output
```shell
Code of Conduct
- [x] I agree to follow this project's Code of Conduct