kuzu icon indicating copy to clipboard operation
kuzu copied to clipboard

Add `IF NOT EXISTS` option to node and edge table creation.

Open prrao87 opened this issue 2 years ago • 8 comments

SQL databases provide the convenient clause IF NOT EXIST when creating tables to avoid the user having to handle errors when a table with the same name already exists.

Creating a node table in Kùzu is a bit tedious in comparison. Currently the user has to perform custom exception handling (e.g., in Python, using if/else statements or try/except blocks to handle failure when the table already exists), which slows down experimentation and requires more boilerplate.

It would be great to have a feature that allows users to create tables using this syntax:

CREATE NODE TABLE IF NOT EXISTS
Person(
    person_id INT64,
    name STRING,
    age INT64,
    PRIMARY KEY (person_id)
);
CREATE REL TABLE IF NOT EXISTS
Follows(
    FROM Person TO Person
);

prrao87 avatar Feb 13 '24 21:02 prrao87

Per the discussion with Semih, this might be on hold till we have evidence that it's a useful feature.

prrao87 avatar Feb 13 '24 21:02 prrao87

slows down experimentation and requires more boilerplate

Indeed, I've come across Kuzu about 30 minutes ago and already stumbled upon this issue. I wanted to re-run the script that created the DB:

conn.execute(
    "CREATE NODE TABLE Person(name STRING, PRIMARY KEY(name))")
conn.execute(
    "CREATE REL TABLE Parent(FROM Person TO Person)"
)

...but it failed saying:

Traceback (most recent call last):
  File "./venv/lib/python3.12/site-packages/marimo/_runtime/cell_runner.py", line 238, in run
    return_value = execute_cell(cell, self.glbls)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./venv/lib/python3.12/site-packages/marimo/_ast/cell.py", line 444, in execute_cell
    exec(cell.body, glbls)
  File "/var/folders/61/p7f15sln0gxd7lwbr58cy6f00000gn/T/marimo_8105/__marimo__cell_vblA_.py", line 1, in <module>
    conn.execute(
  File "./venv/lib/python3.12/site-packages/kuzu/connection.py", line 92, in execute
    _query_result = self._connection.execute(prepared_statement._prepared_statement, parameters)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Binder exception: Person already exists in catalog.

Makes sense, but what can I do to get rid of this? Should I just catch the RuntimeError? What if it's a different kind of RuntimeError? I want exactly the one related to attempting to create a NODE TABLE that already exists.

How should I handle this?

ForceBru avatar Apr 29 '24 16:04 ForceBru

Hi @ForceBru, welcome, and hope you enjoy using Kùzu!

There's a relatively simple way to handle this in Python that requires just a couple extra lines of code:

import kuzu
import shutil

db_path = './my_kuzu_db'
shutil.rmtree(db_path, ignore_errors=True)

# db = kuzu.Database(db_path)
# conn = kuzu.Connection(db)
# ...

You can basically overwrite the database directory by using shutil.rmtree() - that way the next time you run the script, it will recreate the node/rel tables.

Note that this isn't recommended in any pipeline that runs in production, as you may inadvertently delete data when it was not intended. This approach shown only makes sense in early stages of experimentation, and you'd ideally remove the shutil.rmtree() line when productionizing a workflow to properly catch and handle exceptions for cases where the tables already exist. Hope this helps!

EDIT: An alternate way to handle this in the experimentation stage is to wrap the DDL lines (where you create the tables) in a try/except block. That way you can explicitly handle the RuntimeError as you wish and the script can be rerun as many times as needed.

prrao87 avatar Apr 29 '24 16:04 prrao87

To answer your other question about "different kind of RuntimeError", we don't have a separate error type for catching existing node/rel tables that already exist. But at that stage of the pipeline, the "table already exists" error is the most likely error one might encounter, so if you come across any cases where you need more fine-grained or explicit error handling, let us know with more details and code snippets, where applicable. Thanks!

prrao87 avatar Apr 29 '24 16:04 prrao87

Okay, got it, delete directory with the DB or handle the RuntimeError. Thanks!

It might also be useful (mainly for early experimentation and new user onboarding) to support in-memory databases like in sqlite, such that one could write db = kuzu.Database(":memory:") and mess with it without polluting the filesystem.

ForceBru avatar Apr 29 '24 16:04 ForceBru

Yup, in-memory by default is one of the main areas where Kùzu is different from OLAP embedded systems like DuckDB - see #1816 for the existing issue and discussion on this. It's on the longer term roadmap but Kùzu's query engine was built to run on on-disk storage, so this feature requires additional effort to implement from the system internals perspective.

prrao87 avatar Apr 29 '24 16:04 prrao87

I've just run into this issue, and ended up going with a rather un-pythonic "ask permission" approach:

if not conn.execute("CALL SHOW_TABLES() WHERE name = '<table-name-here>' RETURN name;").has_next():
    conn.execute("CREATE NODE TABLE <table-name-here>(<fields-go-here>)")

alanmeeson avatar May 05 '24 14:05 alanmeeson

Yes please! I'm creating migration scripts now for longer living dbs, so it'll be extremely useful.

ShravanSunder avatar May 16 '24 21:05 ShravanSunder

This feature is added in #3610 and should be available in the nightly build tmr.

andyfengHKU avatar Jun 11 '24 02:06 andyfengHKU

Thank you 🙏

On Mon, Jun 10, 2024 at 20:45 andyfeng @.***> wrote:

This feature is added in #3601 https://github.com/kuzudb/kuzu/pull/3601 and should be available in the nightly build tmr.

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2159666116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6N5M6WANTPPIGKOYKDZGZQDVAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGY3DMMJRGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

stugorf avatar Jun 11 '24 04:06 stugorf

@andyfengHKU I just tried this: conn.execute("CREATE NODE TABLE IF NOT EXISTS Person(name STRING, age INT64, PRIMARY KEY (name))")

Any idea why I get this error back: ERROR - An error occurred while loading data into the database: Failed to create table in Kuzu database: Parser exception: mismatched input 'NOT' expecting '(' (line: 1, offset: 21) "CREATE NODE TABLE IF NOT EXISTS Person(name STRING, age INT64, PRIMARY KEY (name))"

stugorf avatar Jul 06 '24 16:07 stugorf

Hi @stugorf did you get this error after installing the latest nightly build? Could you confirm which dev release you're working with?

prrao87 avatar Jul 06 '24 18:07 prrao87

Hi Prashanth,

I have v0.4.2:latest running.

Regards,

David

On Sat, Jul 6, 2024 at 11:24 AM Prashanth Rao @.***> wrote:

Hi @stugorf https://github.com/stugorf did you get this error after installing the latest nightly build? Could you confirm which dev release you're working with?

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2211841510, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6NVJDEBV5PZTWQH5CDZLAY7FAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRHA2DCNJRGA . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 06 '24 22:07 stugorf

Ah, I don't believe this is supported in 0.4.2 yet. We will be releasing 0.5.0 very soon and it will be supported there - till then could you run on the dev version and let us know if that works?

prrao87 avatar Jul 06 '24 23:07 prrao87

Sure! What tag should I pull?

On Sat, Jul 6, 2024 at 4:00 PM Prashanth Rao @.***> wrote:

Ah, I don't believe this is supported in 0.4.2 yet. We will be releasing 0.5.0 very soon and it will be supported there - till then could you run on the dev version and let us know if that works?

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212036443, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6NOWYQUU7U46NKQJOTZLBZKBAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGAZTMNBUGM . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 06 '24 23:07 stugorf

pip install --pre kuzu

you may need to uninstall your existing version before you run that command tho.

prrao87 avatar Jul 06 '24 23:07 prrao87

Hi Prashanth,

I am using Poetry so poetry add kuzu --allow-prereleases worked and installed kuzu = {version = "^0.4.3.dev47", allow-prereleases = true}. My code now correctly executes conn.execute("CREATE NODE TABLE IF NOT EXISTS Person(name STRING, age INT64, PRIMARY KEY (name))")

On Sat, Jul 6, 2024 at 4:05 PM Prashanth Rao @.***> wrote:

pip install --pre kuzu

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212037716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6PMX2VFGJ7I7SNHQODZLBZ3VAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGAZTONZRGY . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 06 '24 23:07 stugorf

Thank you for the help!

On Sat, Jul 6, 2024 at 16:09 David Hughes @.***> wrote:

Hi Prashanth,

I am using Poetry so poetry add kuzu --allow-prereleases worked and installed kuzu = {version = "^0.4.3.dev47", allow-prereleases = true}. My code now correctly executes conn.execute("CREATE NODE TABLE IF NOT EXISTS Person(name STRING, age INT64, PRIMARY KEY (name))")

On Sat, Jul 6, 2024 at 4:05 PM Prashanth Rao @.***> wrote:

pip install --pre kuzu

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212037716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6PMX2VFGJ7I7SNHQODZLBZ3VAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGAZTONZRGY . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 06 '24 23:07 stugorf

Great, glad that worked!

prrao87 avatar Jul 06 '24 23:07 prrao87

Prashanth,

Will something like IF EXISTS be implemented for DROP so that we can execute a command like DROP IF EXISTS Person

Regards,

David

On Sat, Jul 6, 2024 at 4:23 PM Prashanth Rao @.***> wrote:

Great, glad that worked!

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212046778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6MMWXNE2YEV2B6A2DDZLB35JAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGA2DMNZXHA . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 07 '24 18:07 stugorf

Hi @stugorf, we haven't prioritized this yet as we normally wait to see if there's need from the community on specific keywords that require changes to our grammar. Though it seems like the DROP TABLE [IF EXISTS] syntax exists in Postgres SQL.

I'll create an issue for this, and someone can pick it up in due course. However, in the interim (assuming you're working in Python), you can easily work around this with just one additional line of code:

# Open kuzu connection
import kuzu

db = kuzu.Database("mydb")
conn = kuzu.Connection(db)

# Drop node table
node_table_name = "MyNodeTable"
if node_table_name in conn._get_node_table_names():
    conn.execute(f"DROP {node_table_name}")

# Drop rel table
rel_table_name = "MyRelTable"
rel_tables = [tbl["name"] for tbl in conn._get_rel_table_names()]
if rel_table_name in rel_tables:
    conn.execute(f"DROP {rel_table_name}")

Not ideal, but it works well if you're using Python. You can also apply the not operator to negate the above check, and perform specific actions only if the table doesn't exist. Hope this helps!

prrao87 avatar Jul 07 '24 19:07 prrao87

Thank you for the workaround in the meantime. With the -dev version of kuzu does the explorer work? I am getting this error: [21:26:57.994] INFO (1): Access mode: READ_WRITE [21:26:58.019] ERROR (1): Error getting version of Kùzu: Error: std::bad_alloc

On Sun, Jul 7, 2024 at 12:27 PM Prashanth Rao @.***> wrote:

Hi @stugorf https://github.com/stugorf, we haven't prioritized this yet as we normally wait to see if there's need from the community on specific keywords that require changes to our grammar. Though it seems like the DROP TABLE [IF EXISTS] syntax exists in Postgres https://www.postgresql.org/docs/current/sql-droptable.html SQL.

I'll create an issue for this, and someone can pick it up in due course. However, in the interim (assuming you're working in Python), you can easily work around this with just one additional line of code:

Open kuzu connectionimport kuzu

db = kuzu.Database("mydb")conn = kuzu.Connection(db)

Drop node tablenode_table_name = "MyNodeTable"if node_table_name in conn._get_node_table_names():

conn.execute("DROP MyNodeTable")

Drop rel tablerel_table_name = "MyRelTable"if rel_table_name in conn._get_rel_table_names():

conn.execute("DROP MyRelTable")

Not ideal, but it works well if you're using Python. You can also apply the not operator to negate the above check, and perform specific actions only if the table doesn't exist. Hope this helps!

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212545629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6M5ZA633XZ3W2UPSATZLGJBXAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU2DKNRSHE . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 07 '24 21:07 stugorf

Thank you for the workaround in the meantime. With the -dev version of kuzu does the explorer work? I am getting this error: [21:26:57.994] INFO (1): Access mode: READ_WRITE [21:26:58.019] ERROR (1): Error getting version of Kùzu: Error: std::bad_alloc

This happens most likely because the DB version you created is different than the DB version that's running in Kuzu Explorer. So if you created the db with a nightly build, then try launching the nightly Explorer. For example use this docker-compose.yml:

services:
  explorer:
    image: kuzudb/explorer:dev
    restart: unless-stopped
    environment:
      - MODE=READ_ONLY
    ports:
      - 8000:8000
    volumes:
      - ./ex_db_kuzu:/database

and then do:

docker compose up

semihsalihoglu-uw avatar Jul 07 '24 21:07 semihsalihoglu-uw

Perfect; I was just reading in the repo to use the 'dev' tag. Thank you.

On Sun, Jul 7, 2024 at 2:42 PM Semih Salihoglu @.***> wrote:

Thank you for the workaround in the meantime. With the -dev version of kuzu does the explorer work? I am getting this error: [21:26:57.994] INFO (1): Access mode: READ_WRITE [21:26:58.019] ERROR (1): Error getting version of Kùzu: Error: std::bad_alloc … <#m_7732110577369242028_> On Sun, Jul 7, 2024 at 12:27 PM Prashanth Rao @.*> wrote: Hi @stugorf https://github.com/stugorf https://github.com/stugorf https://github.com/stugorf, we haven't prioritized this yet as we normally wait to see if there's need from the community on specific keywords that require changes to our grammar. Though it seems like the DROP TABLE [IF EXISTS] syntax exists in Postgres https://www.postgresql.org/docs/current/sql-droptable.html https://www.postgresql.org/docs/current/sql-droptable.html SQL. I'll create an issue for this, and someone can pick it up in due course. However, in the interim (assuming you're working in Python), you can easily work around this with just one additional line of code: # Open kuzu connectionimport kuzu db = kuzu.Database("mydb")conn = kuzu.Connection(db)

Drop node tablenode_table_name = "MyNodeTable"if node_table_name in

conn._get_node_table_names(): conn.execute("DROP MyNodeTable") # Drop rel tablerel_table_name = "MyRelTable"if rel_table_name in conn._get_rel_table_names(): conn.execute("DROP MyRelTable") Not ideal, but it works well if you're using Python. You can also apply the not operator to negate the above check, and perform specific actions only if the table doesn't exist. Hope this helps! — Reply to this email directly, view it on GitHub <#2878 (comment) https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212545629>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6M5ZA633XZ3W2UPSATZLGJBXAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU2DKNRSHE https://github.com/notifications/unsubscribe-auth/ABKDS6M5ZA633XZ3W2UPSATZLGJBXAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU2DKNRSHE . You are receiving this because you were mentioned.Message ID: @.*>

This happens most likely because the DB version you created is different than the DB version that's running in Kuzu Explorer. So if you created the db with a nightly build, then try launching the nightly Explorer. For example use this docker-compose.tml:

services: explorer: image: kuzudb/explorer:dev restart: unless-stopped environment: - MODE=READ_ONLY ports: - 8000:8000 volumes: - ./ex_db_kuzu:/database

and then do:

docker compose pull docker compose up

— Reply to this email directly, view it on GitHub https://github.com/kuzudb/kuzu/issues/2878#issuecomment-2212585292, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6JUOIMVOM7NT2CZPIDZLGY2ZAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU4DKMRZGI . You are receiving this because you were mentioned.Message ID: @.***>

stugorf avatar Jul 07 '24 21:07 stugorf

Yes, if this is indeed the case, let's open an issue to give a better error message. I'll ask @mewim to take a look.

semihsalihoglu-uw avatar Jul 07 '24 21:07 semihsalihoglu-uw

Perfect; I was just reading in the repo to use the 'dev' tag. Thank you. On Sun, Jul 7, 2024 at 2:42 PM Semih Salihoglu @.> wrote: Thank you for the workaround in the meantime. With the -dev version of kuzu does the explorer work? I am getting this error: [21:26:57.994] INFO (1): Access mode: READ_WRITE [21:26:58.019] ERROR (1): Error getting version of Kùzu: Error: std::bad_alloc … <#m_7732110577369242028_> On Sun, Jul 7, 2024 at 12:27 PM Prashanth Rao @.> wrote: Hi @stugorf https://github.com/stugorf https://github.com/stugorf https://github.com/stugorf, we haven't prioritized this yet as we normally wait to see if there's need from the community on specific keywords that require changes to our grammar. Though it seems like the DROP TABLE [IF EXISTS] syntax exists in Postgres https://www.postgresql.org/docs/current/sql-droptable.html https://www.postgresql.org/docs/current/sql-droptable.html SQL. I'll create an issue for this, and someone can pick it up in due course. However, in the interim (assuming you're working in Python), you can easily work around this with just one additional line of code: # Open kuzu connectionimport kuzu db = kuzu.Database("mydb")conn = kuzu.Connection(db) # Drop node tablenode_table_name = "MyNodeTable"if node_table_name in conn._get_node_table_names(): conn.execute("DROP MyNodeTable") # Drop rel tablerel_table_name = "MyRelTable"if rel_table_name in conn._get_rel_table_names(): conn.execute("DROP MyRelTable") Not ideal, but it works well if you're using Python. You can also apply the not operator to negate the above check, and perform specific actions only if the table doesn't exist. Hope this helps! — Reply to this email directly, view it on GitHub <#2878 (comment) <#2878 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6M5ZA633XZ3W2UPSATZLGJBXAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU2DKNRSHE https://github.com/notifications/unsubscribe-auth/ABKDS6M5ZA633XZ3W2UPSATZLGJBXAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU2DKNRSHE . You are receiving this because you were mentioned.Message ID: @.> This happens most likely because the DB version you created is different than the DB version that's running in Kuzu Explorer. So if you created the db with a nightly build, then try launching the nightly Explorer. For example use this docker-compose.tml: services: explorer: image: kuzudb/explorer:dev restart: unless-stopped environment: - MODE=READ_ONLY ports: - 8000:8000 volumes: - ./ex_db_kuzu:/database and then do: docker compose pull docker compose up — Reply to this email directly, view it on GitHub <#2878 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDS6JUOIMVOM7NT2CZPIDZLGY2ZAVCNFSM6AAAAABDHHAUOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGU4DKMRZGI . You are receiving this because you were mentioned.Message ID: @.**>

Explorer itself has the correct infrastructure to handle a storage version mismatch, as shown below:

Screenshot 2024-07-08 at 8 52 12 AM

However, I think the issue was that the dev builds do not actively maintain the storage version number. The storage version number is defined at: https://github.com/kuzudb/kuzu/blob/9df0a994535829df598dbfdf135795513554850c/src/include/storage/storage_version_info.h#L15-L20

For each stable release, we manually update this dictionary to add the version number of kuzu and storage. But since each dev build is automatically built and deployed, it does not get its own storage version number. Instead, it simply assumes that it has the latest storage version number in the dictionary.

mewim avatar Jul 08 '24 12:07 mewim