duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

Python commits from same thread, but different connections make unexpected results.

Open nicku33 opened this issue 2 years ago • 4 comments

What happens?

When two python connections, form the same thread and process, commit two new tables, one is overwritten. My expectation is that both tables appear, since there is no conflict.

To Reproduce

# make a new db
duckdb /tmp/tmp.ddb -c "create table X (a int)";

Given an existing /tmp/tmp.ddb with no tables.

nursa@Nicholass-MacBook-Air ~ % python
Python 3.9.11 (main, Apr 10 2022, 10:19:17)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import duckdb
>>> con1 = duckdb.connect("/tmp/tmp.ddb")
>>> con2 = duckdb.connect("/tmp/tmp.ddb")
>>> con1.execute("CREATE TABLE foo1 as SELECT 1 as a, 2 as b")
<duckdb.DuckDBPyConnection object at 0x1008f3b70>
>>> con1.commit()
<duckdb.DuckDBPyConnection object at 0x1008f3b70>
>>> con2.execute("CREATE TABLE bar1 as SELECT 2 as a, 3 as b")
<duckdb.DuckDBPyConnection object at 0x1008f5bf0>
>>> con2.commit()
<duckdb.DuckDBPyConnection object at 0x1008f5bf0>
>>> con2.close()
>>> con1.close()
>>> con3 = duckdb.connect("/tmp/tmp.ddb")
>>> con3.execute("select * from information_schema.tables").fetchall()
[(None, 'main', 'bar1', 'BASE TABLE', None, None, None, None, None, 'YES', 'NO', None)]
>>>

This shows that con2 has overwritten the table created by con1 even though con1 committed.

~~My expectation is that the second connection, which was opened writeable, should throw an exception as soon as the connection attempt is made to the same path.~~ My expectation is to see both tables.

Environment (please complete the following information):

  • OS: OS X 12.3.1
  • DuckDB Version: v0.4.0
  • DuckDB Client: python

Identity Disclosure:

  • Full Name: Nicholas Ursa
  • Affiliation: MotherDuck Corp.

Before Submitting

  • [X] Have you tried this on the latest master branch?
  • Python: pip install duckdb --upgrade --pre Downloading duckdb-0.4.1.dev188-cp39-cp39-macosx_11_0_arm64.whl (13.5 MB)
  • [X] Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

nicku33 avatar Jul 01 '22 19:07 nicku33

cc @jwills

nicku33 avatar Jul 01 '22 19:07 nicku33

@pdet Actually, let me correct my expectations.

I think what I'm expecting from the above, where one process and one thread makes two connections, is that both tables appear after commit. The creation of the two tables should not conflict.

btw: When I create connections from two different python process, I get the expected error.

nicku33 avatar Jul 06 '22 17:07 nicku33

Hey @nicku33, I've coded a solution for this issue already (will patch it tomorrow), the idea is to cache database instances for connections created over the same db file. That means that different write connections should be able to connect to a file, with the MVCC taking care of any updates underneath. The only downside of sharing a database instance is that both connections must have the same configuration, if they have different configurations, then I throw an error. (This is ofc for a single process, I guess multi-process should always just error out).

pdet avatar Jul 06 '22 19:07 pdet

Super, @pdet ! Can't wait. Yes I agree multiple processes should error out, since you can't manage the concurrency.

nicku33 avatar Jul 06 '22 21:07 nicku33

Any updates on this?

sungchun12 avatar Sep 30 '22 18:09 sungchun12

#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.

Mytherin avatar Sep 30 '22 18:09 Mytherin

Ooh can I really? I wonder if I can make that work with https://github.com/jwills/dbt-duckdb; @sungchun12 I assume that was why you were asking?

jwills avatar Sep 30 '22 19:09 jwills

@jwills you know my heart too well. That's exactly why I asked for an update ;)

sungchun12 avatar Sep 30 '22 19:09 sungchun12

yah okay let me see if i can make what @Mytherin said work... 👀

jwills avatar Sep 30 '22 19:09 jwills

Hrm this isn't my prettiest code, but this works correctly in all of my local testing (e.g. trying to create 10 different dbt models using 4 threads): https://github.com/jwills/dbt-duckdb/pull/16/files

jwills avatar Sep 30 '22 20:09 jwills

#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.

Is this a WIP? I thought it was in a feature-freeze for v0.5, no?

pdet avatar Sep 30 '22 20:09 pdet

I assumed it was feature-freeze for 0.6

jwills avatar Sep 30 '22 20:09 jwills

#4414 should solve this, but it is still a work in progress. In the meantime you can use multiple cursors spawned from the same connection instead of opening multiple connections instead.

Is this a WIP? I thought it was in a feature-freeze for v0.5, no?

At least the merge conflicts will need to be resolved, but I will have another pass as I haven't looked at it since I got back from VLDB.

Mytherin avatar Oct 01 '22 08:10 Mytherin