crate-python icon indicating copy to clipboard operation
crate-python copied to clipboard

[META] Make CrateDB work with `dataset`

Open amotl opened this issue 3 years ago • 1 comments

Hi there,

back in a while, I've tried to use the sweet dataset package with CrateDB.

Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.

For exercising it, and to provide a common ground for others to experiment with, I've created the cratedb-dataset-demo.py gist.

Within this meta issue, all related issues will be tracked which are needed to make the demo program work completely.

With kind regards, Andreas.

References

  • https://github.com/crate/crate/issues/11020
  • https://github.com/crate/crate/issues/11039
  • https://github.com/crate/crate/pull/11165
  • https://github.com/crate/crate/issues/13102
  • https://github.com/crate/crate/issues/13104
  • https://github.com/crate/crate-python/issues/453
  • https://github.com/crate/crate-python/issues/454
  • https://github.com/crate/crate-python/issues/455

amotl avatar Oct 06 '22 12:10 amotl

With recent improvements, most notably https://github.com/crate/crate/pull/11165, which added the gen_random_text_uuid() scalar function, primary key values can be automatically generated when inserting new records. This was essential to make INSERT operations work like table.insert(dict(name="John Doe", age=37)).

The corresponding SQL DDL statement looks like:

CREATE TABLE IF NOT EXISTS "doc"."testdrive" (
    "id" TEXT DEFAULT gen_random_text_uuid() NOT NULL,
    "name" TEXT,
    "age" INTEGER,
    "gender" TEXT,
    PRIMARY KEY ("id")
);

Currently, the schema has to be provided manually, maybe because dataset itself only handles automatic provisioning of autoincrement-like columns ^1. It would be a nice-to-have to make the automatic schema creation work, like it works on other databases as well.

One of the main features of dataset is to automatically create tables and columns as data is inserted. This behaviour can optionally be disabled via the ensure_schema argument. It can also be overridden in a lot of the data manipulation methods using the ensure flag.

-- https://dataset.readthedocs.io/en/latest/api.html#connecting

amotl avatar Oct 06 '22 12:10 amotl