concurrently query the information schema table cause intermittent segmentation fault in python app.
What happens?
using latest duckdb 1.4.1 with ducklake, and our app use multi threads, and each threads is using a cursor of the connection object to the duckdb that attached with a ducklake (using postgres as catalog server)
and we get intermittent segmentation fault from time to time (it is only on a debian linux, seems not happening in a macbook) here is the bt result from gdb after we capture the core dump:
#4 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#5 0x00007f67e5b7bf4f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#6 0x00007f67e5b2cfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#7 0x00007f67e5b17472 in __GI_abort () at ./stdlib/abort.c:79
#8 0x00007f67e5b7042f in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f67e5c8b459 "%s\n") at ../sysdeps/posix/libc_fatal.c:156
#9 0x00007f67e5b8586a in malloc_printerr (str=str@entry=0x7f67e5c890b1 "free(): invalid pointer") at ./malloc/malloc.c:5660
#10 0x00007f67e5b875f4 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4435
#11 0x00007f67e5b89f4f in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3385
#12 0x00007f670daf5981 in std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator=(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) [clone .isra.0] () from /tmp/app/.ascend/.duckdb/extensions/v1.4.1/linux_amd64/ducklake.duckdb_extension
#13 0x00007f670daf6787 in duckdb::DuckLakeViewEntry::Bind(duckdb::ClientContext&) () from /tmp/app/.ascend/.duckdb/extensions/v1.4.1/linux_amd64/ducklake.duckdb_extension
#14 0x00007f670da5077f in duckdb::DuckLakeSchemaEntry::Scan(duckdb::ClientContext&, duckdb::CatalogType, std::function<void (duckdb::CatalogEntry&)> const&) () from /tmp/app/.ascend/.duckdb/extensions/v1.4.1/linux_amd64/ducklake.duckdb_extension
#15 0x00007f677d33c02f in duckdb::DuckDBTablesInit(duckdb::ClientContext&, duckdb::TableFunctionInitInput&) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#16 0x00007f677d664e5b in duckdb::TableScanGlobalSourceState::TableScanGlobalSourceState(duckdb::ClientContext&, duckdb::PhysicalTableScan const&) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#17 0x00007f677d65d75e in duckdb::PhysicalTableScan::GetGlobalSourceState(duckdb::ClientContext&) const () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#18 0x00007f677ce362cc in duckdb::Pipeline::ResetSource(bool) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#19 0x00007f677ce3654a in duckdb::Pipeline::Reset() () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#20 0x00007f677ce43091 in duckdb::Pipeline::Schedule(duckdb::shared_ptr<duckdb::Event, true>&) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#21 0x00007f677ce43160 in duckdb::PipelineEvent::Schedule() () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#22 0x00007f677ce356ba in duckdb::Event::CompleteDependency() () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#23 0x00007f677ce355ae in duckdb::Event::Finish() () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#24 0x00007f677ce45ec2 in duckdb::PipelineInitializeTask::ExecuteTask(duckdb::TaskExecutionMode) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#25 0x00007f677ce393d6 in duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#26 0x00007f677ce41ed4 in duckdb::Executor::ExecuteTask(bool) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#27 0x00007f677ce00af0 in duckdb::ClientContext::ExecuteTaskInternal(duckdb::ClientContextLock&, duckdb::BaseQueryResult&, bool) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#28 0x00007f677ce00cb3 in duckdb::PendingQueryResult::ExecuteTask() () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#29 0x00007f677c4bffd2 in duckdb::DuckDBPyConnection::CompletePendingQuery(duckdb::PendingQueryResult&) () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#30 0x00007f677c4ce3bd in ?? () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#31 0x00007f677c4d55df in ?? () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#32 0x00007f677c4f86d7 in ?? () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#33 0x00007f677c467cd5 in ?? () from /app/.venv/lib/python3.12/site-packages/_duckdb.cpython-312-x86_64-linux-gnu.so
#34 0x00007f67e5ee96f2 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#35 0x00007f67e5ece312 in _PyObject_MakeTpCall () from /usr/local/bin/../lib/libpython3.12.so.1.0
#36 0x00007f67e5f08234 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#37 0x00007f67e5de779f in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#38 0x00007f67e5ed0708 in _PyObject_FastCallDictTstate () from /usr/local/bin/../lib/libpython3.12.so.1.0
#39 0x00007f67e5ef277e in _PyObject_Call_Prepend () from /usr/local/bin/../lib/libpython3.12.so.1.0
#40 0x00007f67e5fa4c4c in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#41 0x00007f67e5ef5899 in _PyObject_Call () from /usr/local/bin/../lib/libpython3.12.so.1.0
#42 0x00007f67e5de779f in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#43 0x00007f67e5f081a6 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#44 0x00007f67e5fd43e8 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#45 0x00007f67e5f8f9b8 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#46 0x00007f67e5b7a1f5 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#47 0x00007f67e5bf9b40 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
(gdb)
and from the python stack trace, it seems the seg fault is happening when we executing this query:
SELECT table_name, table_type
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = '{table_name}' AND TABLE_SCHEMA = '{table_schema}' AND TABLE_CATALOG = '{table_catalog}'
To Reproduce
in a multi thread setup, and we executing the query to look up table name concurrently.
OS:
Debian Stable
DuckDB Version:
1.4.1
DuckLake Version:
using the stable version
DuckDB Client:
python
Hardware:
No response
Full Name:
rui yang
Affiliation:
ascend.io
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- [x] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [x] Yes, I have
Hey @ruiyang2015 thanks for the report! However without more context and a reproducer it is going to be very hard to work on this. Let us know if you can find a self-contained reproducer and we'll take a look!
Thanks for opening this issue in the DuckLake issue tracker! To resolve this issue, our team needs a reproducible example. This includes:
- A source code snippet which reproduces the issue.
- The snippet should be self-contained, i.e., it should contain all imports and should use relative paths instead of hard coded paths (please avoid
/Users/JohnDoe/...). - A lot of issues can be reproduced with plain SQL code executed in the DuckDB command line client. If you can provide such an example, it greatly simplifies the reproduction process and likely results in a faster fix.
- If the script needs additional data, please share the data as a CSV, JSON, or Parquet file. Unfortunately, we cannot fix issues that can only be reproduced with a confidential data set. Support contracts allow sharing confidential data with the core DuckDB team under NDA.
For more detailed guidelines on how to create reproducible examples, please visit Stack Overflow's “Minimal, Reproducible Example” page.