duckdb
duckdb copied to clipboard
Python commits from same thread, but different connections make unexpected results.
What happens?
When two python connections, form the same thread and process, commit two new tables, one is overwritten. My expectation is that both tables appear, since there is no conflict.
To Reproduce
# make a new db
duckdb /tmp/tmp.ddb -c "create table X (a int)";
Given an existing /tmp/tmp.ddb
with no tables.
nursa@Nicholass-MacBook-Air ~ % python
Python 3.9.11 (main, Apr 10 2022, 10:19:17)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import duckdb
>>> con1 = duckdb.connect("/tmp/tmp.ddb")
>>> con2 = duckdb.connect("/tmp/tmp.ddb")
>>> con1.execute("CREATE TABLE foo1 as SELECT 1 as a, 2 as b")
<duckdb.DuckDBPyConnection object at 0x1008f3b70>
>>> con1.commit()
<duckdb.DuckDBPyConnection object at 0x1008f3b70>
>>> con2.execute("CREATE TABLE bar1 as SELECT 2 as a, 3 as b")
<duckdb.DuckDBPyConnection object at 0x1008f5bf0>
>>> con2.commit()
<duckdb.DuckDBPyConnection object at 0x1008f5bf0>
>>> con2.close()
>>> con1.close()
>>> con3 = duckdb.connect("/tmp/tmp.ddb")
>>> con3.execute("select * from information_schema.tables").fetchall()
[(None, 'main', 'bar1', 'BASE TABLE', None, None, None, None, None, 'YES', 'NO', None)]
>>>
This shows that con2
has overwritten the table created by con1
even though con1
committed.
~~My expectation is that the second connection, which was opened writeable, should throw an exception as soon as the connection attempt is made to the same path.~~ My expectation is to see both tables.
Environment (please complete the following information):
- OS: OS X 12.3.1
- DuckDB Version: v0.4.0
- DuckDB Client: python
Identity Disclosure:
- Full Name: Nicholas Ursa
- Affiliation: MotherDuck Corp.
Before Submitting
- [X] Have you tried this on the latest
master
branch?
-
Python:
pip install duckdb --upgrade --pre
Downloading duckdb-0.4.1.dev188-cp39-cp39-macosx_11_0_arm64.whl (13.5 MB)
- [X] Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
cc @jwills
@pdet Actually, let me correct my expectations.
I think what I'm expecting from the above, where one process and one thread makes two connections, is that both tables appear after commit. The creation of the two tables should not conflict.
btw: When I create connections from two different python process, I get the expected error.
Hey @nicku33, I've coded a solution for this issue already (will patch it tomorrow), the idea is to cache database instances for connections created over the same db file. That means that different write connections should be able to connect to a file, with the MVCC taking care of any updates underneath. The only downside of sharing a database instance is that both connections must have the same configuration, if they have different configurations, then I throw an error. (This is ofc for a single process, I guess multi-process should always just error out).
Super, @pdet ! Can't wait. Yes I agree multiple processes should error out, since you can't manage the concurrency.
Any updates on this?
#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.
Ooh can I really? I wonder if I can make that work with https://github.com/jwills/dbt-duckdb; @sungchun12 I assume that was why you were asking?
@jwills you know my heart too well. That's exactly why I asked for an update ;)
yah okay let me see if i can make what @Mytherin said work... 👀
Hrm this isn't my prettiest code, but this works correctly in all of my local testing (e.g. trying to create 10 different dbt models using 4 threads): https://github.com/jwills/dbt-duckdb/pull/16/files
#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.
Is this a WIP? I thought it was in a feature-freeze for v0.5, no?
I assumed it was feature-freeze for 0.6
#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.
Is this a WIP? I thought it was in a feature-freeze for v0.5, no?
At least the merge conflicts will need to be resolved, but I will have another pass as I haven't looked at it since I got back from VLDB.