piccolo
piccolo copied to clipboard
First class Cockroach DB support.
Friction-free Cockroach DB support in Piccolo ORM.
PR: https://github.com/piccolo-orm/piccolo/pull/608
We'd like to see Cockroach Labs promote Piccolo as one of, if not the recommended async ORM. asyncpg
has some support built-in to help us get there.
In the mean time I will be adding notes on how to use CockroachDB with Piccolo, and general progress.
Quick local setup for testing
- Download/extract (single binary): https://www.cockroachlabs.com/docs/stable/install-cockroachdb-linux.html
- Start Cockroach:
./cockroach start-single-node --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --background
- Enter SQL client:
./cockroach sql --insecure
- Can
create database ...;
use ...;
from here.
- Can
- CTRL-C or
killall cockroach
to send graceful termination signal.
Default account is root
without password. Fancy stats dashboard at http://localhost:8080
Both Piccolo Examples and Piccolo Admin are working.
What we still need
- [x] ~~Get build/test suite running on Github Actions for Piccolo QA tooling.~~
- [x] ~~Some way to identify we're running under Cockroach DB.~~
- ~~Added CockroachDB Engine.~~
- [x] ~~Get test suite mostly / fully passing.~~
- [x] ~~Basic documentation~~
- [x] ~~Time Travel queries using AS OF SYSTEM TIME~~
- ~~Some way to do Time Travel SELECT queries (
SELECT ... AS OF SYSTEM TIME
)~~ - ~~https://www.cockroachlabs.com/docs/stable/as-of-system-time.html~~
- :heavy_check_mark: See: https://github.com/piccolo-orm/piccolo/issues/607#issuecomment-1251330590
- ~~Some way to do Time Travel SELECT queries (
- [x] Sharding for Sequential Indexes
- :warning: Done, but will go into its own PR after this one: https://github.com/piccolo-orm/piccolo/issues/622
- Some way to do "Hash Sharded" Indexes (
INDEX ... USING HASH
) - https://www.cockroachlabs.com/docs/stable/hash-sharded-indexes.html
:bug: Known issues in CockroachDB or Asyncpg
[ ] ARRAY support between asyncpg ↔️ cockroach.
- Currently affected by: https://github.com/cockroachdb/cockroach/issues/71908
- See:
piccolo/tests/columns/test_array.py
- M2M works, but
as_list=True
does not because it uses ARRAY
cannot drop UNIQUE
constraint "..." using ALTER TABLE DROP CONSTRAINT
, use DROP INDEX CASCADE
instead
- Tracked: https://github.com/cockroachdb/cockroach/issues/42840
- Affecting
test_alter_column_unique()
intest_migrations.py
column ... is of type int[] and thus is not indexable / column ... is of type varchar[] and thus is not indexable
- Tracked: https://github.com/cockroachdb/cockroach/issues/35730
- Affecting both
test_array_column_integer()
andtest_array_column_varchar
intest_migration_manager.py
:moon: Future nice to haves
CockroachDB executes ALTER COLUMN
queries asynchronously.
- In the short term this can be mitigated by keeping migrations small.
- At odds with Piccolo's assumption that the database is altered before the next migration operation begins. CockroachDB will give a warning:
table <...> is currently undergoing a schema change
if a later operation tries to modify the table before the asynchronous query finishes. CockroachDB may fix this in the future.
:heavy_check_mark: Historical Issues / Solutions of Note
:heavy_check_mark: All issues are resolved in https://github.com/piccolo-orm/piccolo/pull/608 but are here if you need to apply them manually.
:heavy_check_mark: :bug: expected DEFAULT expression to have type timestamp, but 'current_timestamp() + '01:00:00'' has type timestamptz
https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/piccolo/columns/defaults/timestamp.py#L24
return f"CURRENT_TIMESTAMP::timestamp + INTERVAL '{interval_string}'"
https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/piccolo/columns/defaults/timestamp.py#L45
return "current_timestamp::timestamp"
:heavy_check_mark: :bug: asyncpg.exceptions.FeatureNotSupportedError: unimplemented: extension "uuid-ossp" is not yet supported
https://github.com/piccolo-orm/piccolo/blob/d01066002e9932ff7b7ea7e2e678b8a54faf666c/piccolo/engine/postgres.py#L248
#extensions = ["uuid-ossp"]
Or add extensions=[]
in piccolo_config.py
when creating DB = PostgresEngine(...
:heavy_check_mark: :bug: ALTER COLUMN TYPE
is not supported inside a transaction
SET CLUSTER SETTING sql.defaults.experimental_alter_column_type.enabled = true;
Same strategy employed by django-cockroach
- Affecting
test_alter_column_digits()
intest_migration_manager.py
- Tracked: https://github.com/cockroachdb/cockroach/issues/49351?version=v22.1
- Related: https://github.com/cockroachdb/cockroach/issues/49329
No worries, we'll use this issue instead as it's more detailed.
I experimented a bit and researched about this.
Tests / CI
- Add Cockroach to the Piccolo QA tooling.
For Github actions only found an example of how Peewee ORM does it (basically downloading CockroachDB, starting with start-single-node
and creating a test database using CockroachDB sql - same as a local installation but I don't know how it works in practice). Peewee also has special field-types for CockroachDB specifics (primary key and uuid).
I was able to run the tests locally using PostgresEngine
and created a few abstract methods e.g
# in columns/base.py
class Default(ABC):
@abstractproperty
def postgres(self) -> str:
pass
@abstractproperty
def sqlite(self) -> str:
pass
@abstractproperty
def cockroach(self) -> str:
pass
# and then in columns/timestamp.py
class TimestampNow(Default):
@property
def postgres(self):
return "current_timestamp"
@property
def sqlite(self):
return "current_timestamp"
@property
def cockroach(self):
return "current_timestamp::timestamp"
for the CockroachDB specifics (because without that Postgres tests are failing). The result is that a lot of the tests failed (DuplicateTableError
, InterfaceError
etc.) and there is also a few errors. Also all tests that return a serial ID in the response fail because CockroachDB has a different primary key implementation.
Tests run 10 times slower than postgres (even with start-single-node) or I don't know how to fine tune it (I use this command to start single node cockroach start-single-node --insecure --store=node1 --store=type=mem,size=0.5 --listen-addr=localhost:26257
on my local machine (4 cores - 4GB RAM + 2GB swap)). After the tests were completed, I had to use DROP DATABASE piccolo CASCADE
and again CREATE DATABASE piccolo
because the database is not cleaned after the test suite is finished?
Sorry for the long comment, I hope this is useful.
For reference, in my experience, django-cockroach is also a fairly fleshed out postresql :arrow_right: cockroachdb layer, and may also serve as a good reference point for common solutions.
Peewee also has special field-types for CockroachDB specifics (primary key and uuid)
Yeah I do believe this is to set the correct default value (Ex: DEFAULT unique_rowid()
and DEFAULT gen_random_uuid()
). Here it is in django-cockroach: https://github.com/cockroachdb/django-cockroachdb/blob/0d44aaed3b414e263ee9f9ffb7ed71f585fdb9a1/django_cockroachdb/base.py#L34
I think we could do something like this, but Daniel would know better. First create new CockroachEngine
engine
from __future__ import annotations
import typing as t
from piccolo.engine.postgres import PostgresEngine
class CockroachEngine(PostgresEngine):
def __init__(
self,
config: t.Dict[str, t.Any],
extensions: t.Sequence[str] = (),
) -> None:
self.config = config
self.extensions = extensions
super().__init__(config=config, extensions=extensions)
engine_type = "cockroach"
...
and then change Serial column to this
class Serial(Column):
"""
An alias to an autoincrementing integer column in Postgres.
"""
@property
def column_type(self):
engine_type = self._meta.engine_type
if engine_type == "postgres":
return "SERIAL"
elif engine_type == "sqlite":
return "INTEGER"
elif engine_type == "cockroach":
return "DEFAULT unique_rowid()"
raise Exception("Unrecognized engine type")
And then do the same for UUID column.
I have no experience with CockroachDB, but do you know why the tests take so long? I think this is a problem for GH actions.
@sinisaos Yeah, something like that will work.
This needs changing too:
https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/piccolo/query/base.py#L272-L294
So the logic for cockroach is:
- If cockroach query exists, then use it
- Otherwise, if postgres query exists, then use it
- Otherwise, use default query
--store=type=mem,size=0.5
Would bump that to at least 2GB. CockroachDB tends to choke up on anything less.
Tests are running fairly quickly here, but I'm currently hanging on tests/table/instance/test_save.py
On the bright side the tests are running here in 2:13
Currently running into this issue of Cockroach sending back
cannot perform operation: another operation is in progress
. Think this may be because Cockroach is performing operations asynchronously and we aren't retrying.
EDIT: Actually, probably running into the "multiple active portals" issue: https://github.com/MagicStack/asyncpg/issues/738 https://github.com/MagicStack/asyncpg/issues/580 https://github.com/cockroachdb/cockroach/pull/83164
@dantownsend You're right. I just gave an example of what we could do. Maybe you should install CockroachDB when you have time and try to run the tests.
@gnat You're right. If I run with just cockroach start-single-node --insecure --store=node1
result is much better - 8 min but that's probably because of my old and slow laptop :)
I currently "work around" the cannot perform operation: another operation is in progress
in my own internal ORM by replacing fetchval
with fetch
-- but I do not see fetchval
being used in Piccolo.. so it's something else triggering the "multiple active portals" issue..
@dantownsend any ideas?
@gnat I browsed some issues on cockroachdb (like this https://github.com/cockroachdb/cockroach/issues/40195). I don't have a definite answer, just some ideas:
- Could be something to do with transactions.
- Or maybe Piccolo isn't closing a connection somewhere.
- A bug with asyncpg?
I need to play around with it a bit more.
So, some tests are problematic for Cockroach (anything testing sequential keys starting at 1), because even though we can run Cockroach in "compatibility mode", most Cockroach users will never want to do this because it's terrible for performance in Cockroach (the cluster would have to perform a very expensive "full cluster" consensus on INSERT and essentially have a cluster-wide lock :scream:)
Notably "id": 1
and "manager": 1
here:
band = Band(name="Pythonistas", popularity=1000, manager=manager)
band.save().run_sync()
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas",
"manager": 1,
"popularity": 1000,
}
],
)
So we have a few options:
Add to the current test suite.
- Perhaps some decorators like
@cockroach_only
and@cockroach_never
(needed to exclude tests guaranteed to fail), and have extra tests just for Cockroach. - I think this is the way to go, but our test suite will get bigger.
- Any cleaner ways to approach this?
Run compatibility mode.
- Add
DEFAULT nextval()
for Serial at Table creation time just for Cockroach tests. - Really not sure how to do this just at Table creation time without the new
DEFAULT
being used everywhere else and breaking other tests. @dantownsend - As explained above, most people are going to want
DEFAULT unique_rowid()
or UUID, but neither of these will pass the current test suite. - Explained here: https://www.cockroachlabs.com/docs/stable/serial.html
Remove sequential keys from the current assert checks.
- Kinda easy but not very desirable. Mangles current tests.
Ignore large swaths of tests for Cockroach.
- Easiest but not very desirable.
Major work in progress, but figured I'd start pushing up progress so people can try things out as we go.
Notably "id": 1 and "manager": 1 here:
Yes, that's a tricky problem.
We currently have an approach for just running certain tests for certain databases:
https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/tests/base.py#L33-L39
We can create a cockroach_only
decorator. It's might not be sufficient though, if we have loads of tests which don't run on Cockroach - we would also need to mark a bunch of tests as not_cockroach
. A cleaner approach might be to create a new decorator called for_engines
. For example:
@for_engines('sqlite', 'postgres')
class Test1(TestCase):
...
But as you said, ignoring large numbers of tests isn't ideal.
Run compatibility mode. Add DEFAULT nextval() for Serial at Table creation time just for Cockroach tests. Really not sure how to do this just at Table creation time without the new DEFAULT being used everywhere else and breaking other tests. @dantownsend As explained above, most people are going to want DEFAULT unique_rowid() or UUID, but neither of these will pass the current test suite. Explained here: https://www.cockroachlabs.com/docs/stable/serial.html
It's a smart idea. I'm not sure how best to implement it though. Whenever we create tables in test, we could do it via a custom function which set the tables up correctly if using cockroach.
class Test1(Table):
def setUp(self):
create_test_tables(Band)
I need to think about it a bit more ...
@for_engines('sqlite', 'postgres')
class Test1(TestCase):
...
This is smart. Allows us to pick and choose specific tests per engine and easily add other engines in the future. Would allow us to repurpose the Cockroach tests for other massively distributed databases as well.
If you don't mind, what does the implementation of this @for_engines
decorator look like? I am having trouble with it. :sweat_smile:
Here's an example of extending our test suite with @for_engines
would look like.
Keeps the test suite comprehensive and the division of tests are clear.
I've re-worked that test for Cockroach (it just ignores "id"
and "manager"
) but of course, feedback and ideas on how to do it better are much appreciated.
test_save.py
from unittest import TestCase
from piccolo.table import create_db_tables_sync, drop_db_tables_sync
from tests.example_apps.music.tables import Band, Manager
from tests.base import cockroach_only, no_cockroach
class TestSave(TestCase):
def setUp(self):
create_db_tables_sync(Manager, Band)
def tearDown(self):
drop_db_tables_sync(Manager, Band)
def test_save_new(self):
"""
Make sure that saving a new instance works.
"""
manager = Manager(name="Maz")
query = manager.save()
print(query)
self.assertTrue("INSERT" in query.__str__())
query.run_sync()
names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
self.assertTrue("Maz" in names)
manager.name = "Maz2"
query = manager.save()
print(query)
self.assertTrue("UPDATE" in query.__str__())
query.run_sync()
names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
self.assertTrue("Maz2" in names)
self.assertTrue("Maz" not in names)
@for_engines('sqlite', 'postgres')
def test_save_specific_columns(self):
"""
Make sure that we can save a subset of columns.
"""
manager = Manager(name="Guido")
manager.save().run_sync()
band = Band(name="Pythonistas", popularity=1000, manager=manager)
band.save().run_sync()
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas",
"manager": 1,
"popularity": 1000,
}
],
)
band.name = "Pythonistas 2"
band.popularity = 2000
band.save(columns=[Band.name]).run_sync()
# Only the name should update, and not the popularity:
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas 2",
"manager": 1,
"popularity": 1000,
}
],
)
# Also test it using strings to identify columns
band.name = "Pythonistas 3"
band.popularity = 3000
band.save(columns=["popularity"]).run_sync()
# Only the popularity should update, and not the name:
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas 2",
"manager": 1,
"popularity": 3000,
}
],
)
@for_engines('cockroach')
def test_save_specific_columns(self):
"""
Make sure that we can save a subset of columns.
"""
manager = Manager(name="Guido")
manager.save().run_sync()
band = Band(name="Pythonistas", popularity=1000, manager=manager)
band.save().run_sync()
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas",
"popularity": 1000,
},
Band.select().run_sync()[0]
)
band.name = "Pythonistas 2"
band.popularity = 2000
band.save(columns=[Band.name]).run_sync()
# Only the name should update, and not the popularity:
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas 2",
"popularity": 1000,
},
Band.select().run_sync()[0]
)
# Also test it using strings to identify columns
band.name = "Pythonistas 3"
band.popularity = 3000
band.save(columns=["popularity"]).run_sync()
# Only the popularity should update, and not the name:
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas 2",
"popularity": 3000,
},
Band.select().run_sync()[0]
)
Another alternative.
We can use if for_engines('sqlite', 'postgres'):
and if for_engines('cockroach'):
on the individual self.assert..
Direct comparison to the previous for test_save.py
. Your thoughts, ideas?
from unittest import TestCase
from piccolo.table import create_db_tables_sync, drop_db_tables_sync
from tests.example_apps.music.tables import Band, Manager
from tests.base import for_engines
class TestSave(TestCase):
def setUp(self):
create_db_tables_sync(Manager, Band)
def tearDown(self):
drop_db_tables_sync(Manager, Band)
def test_save_new(self):
"""
Make sure that saving a new instance works.
"""
manager = Manager(name="Maz")
query = manager.save()
print(query)
self.assertTrue("INSERT" in query.__str__())
query.run_sync()
names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
self.assertTrue("Maz" in names)
manager.name = "Maz2"
query = manager.save()
print(query)
self.assertTrue("UPDATE" in query.__str__())
query.run_sync()
names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
self.assertTrue("Maz2" in names)
self.assertTrue("Maz" not in names)
def test_save_specific_columns(self):
"""
Make sure that we can save a subset of columns.
"""
manager = Manager(name="Guido")
manager.save().run_sync()
band = Band(name="Pythonistas", popularity=1000, manager=manager)
band.save().run_sync()
if for_engines('sqlite', 'postgres'):
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas",
"manager": 1,
"popularity": 1000,
}
],
)
if for_engines('cockroach'):
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas",
"popularity": 1000,
},
Band.select().run_sync()[0]
)
band.name = "Pythonistas 2"
band.popularity = 2000
band.save(columns=[Band.name]).run_sync()
# Only the name should update, and not the popularity:
if for_engines('sqlite', 'postgres'):
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas 2",
"manager": 1,
"popularity": 1000,
}
],
)
if for_engines('cockroach'):
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas 2",
"popularity": 1000,
},
Band.select().run_sync()[0]
)
# Also test it using strings to identify columns
band.name = "Pythonistas 3"
band.popularity = 3000
band.save(columns=["popularity"]).run_sync()
# Only the popularity should update, and not the name:
if for_engines('sqlite', 'postgres'):
self.assertEqual(
Band.select().run_sync(),
[
{
"id": 1,
"name": "Pythonistas 2",
"manager": 1,
"popularity": 3000,
}
],
)
if for_engines('cockroach'):
self.assertEqual(
Band.select().run_sync()[0] | {
"name":"Pythonistas 2",
"popularity": 3000,
},
Band.select().run_sync()[0]
)
Sorry to interrupt, but we may have additional issues until asyncpg is fully supported in Cockroach. I started running individual tests and a few exceptions are currently not supported. For example, tests for array columns throw exception asyncpg.exceptions._base.UnknownPostgresError: unable to decorrelate subquery
. I don't think it's implemented yet issue.
I also haven't found an alternative to the Postgres FORMAT()
function used for readable columns. https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/piccolo/columns/readable.py#L40-L42
There are probably more such situations that we haven't discovered yet.
Yeah I encountered that the other day as well, thought I'd come back to it as it seems non-essential, maybe @dantownsend can comment. We may be able to just stub it for cockroach for now or put in a placeholder.
I don't see many viable alternatives to postgres format()
or sqlite printf()
in the function reference unless I'm missing something. Odd since there's tons of other string formatting options.
@knz Any thoughts?
Instead of format
we might be able to use ||
which concatenates strings and values.
We would have to refactor this:
https://github.com/piccolo-orm/piccolo/blob/ed93feb85050895cc54e7db9650eddc69ab94fa0/piccolo/columns/readable.py#L13
But I think it's possible.
If there are lots of problems with asyncpg
, another solution is to build the CockroachDB
engine based off psycopg3
instead, which seems to be highly compatible with CockRoachDB.
The engines in Piccolo aren't super complex, so it's possible.
If you don't mind, what does the implementation of this @for_engines decorator look like? I am having trouble with it. 😅
I think it looks something like this:
ENGINE = engine_finder()
def for_engines(*engine_names: str):
if ENGINE:
current_engine_name = ENGINE.engine_type
if current_engine_name not in engine_names:
def wrapper(func):
return pytest.mark.skip(
f"Not running for {current_engine_name}"
)(func)
return wrapper
else:
def wrapper(func):
return func
return wrapper
else:
raise ValueError("Engine not found")
We can use if for_engines('sqlite', 'postgres'): and if for_engines('cockroach'): on the individual self.assert..
We could do that. We would end up with quite a bit of branching in the tests, but in the end it might be the best solution.
I'm still hopefully that there's a super clean solution we haven't thought of yet ... 🤔
We would end up with quite a bit of branching in the tests
This is what I'm experiencing.
Your decorator creates much flatter tests, and the functions themselves are shorter, which is nice.
Also I'm realising I'm needing to do RETURNING id
on many of the INSERT
, because the id
is semi-random, which clutters the postgres
and sqlite
tests a bit too much for my taste.
Thankfully many tests pass fine unmodified (ideal situation).
If there are lots of problems with
asyncpg
, another solution is to build theCockroachDB
engine based offpsycopg3
instead, which seems to be highly compatible with CockRoachDB.The engines in Piccolo aren't super complex, so it's possible.
Interesting. On that note, it seems like @dvarrazzo has committed extra time into ensuring a smooth CRDB story in psycopg: https://github.com/psycopg/psycopg/issues/313
@knz Any thoughts?
-
"cannot decorrelate subquery": we're extremely interested to get that fixed. CockroachDB is supposed to be able to decorrelate a lot nowadays, so your use case is probably a bug. Can you tell us more and then file an issue on the crdb repo?
-
format: see here https://github.com/cockroachdb/cockroach/issues/87260 and indeed using
||
and the::string
cast conversion will help as workaround.
@dantownsend Any advice on how to handle the default type checks tests in tests/apps/migrations/auto/integration/test_migrations.py
?
Example failed test:
def test_bigint_column(self):
self._test_migrations(
table_snapshots=[
[self.table(column)]
for column in [
BigInt(),
BigInt(default=1),
BigInt(default=integer_default),
BigInt(null=True),
BigInt(null=False),
BigInt(index=True),
BigInt(index=False),
]
],
test_function=lambda x: all(
[
x.data_type == "bigint",
x.is_nullable == "NO",
x.column_default == "0",
]
),
)
Fails with:
AssertionError: False is not true : Meta is incorrect: RowMeta(column_default='0:::INT8', column_name='my_column', is_nullable='NO', table_name='my_table', character_maximum_length=None, data_type='bigint', numeric_precision=64, numeric_scale=0, numeric_precision_radix=2)
CockroachDB is adding type casting aka '0:::INT8'
Do I really need to write all new tests with the casts included?
Or is there a way to override or work around this? Suggestions appreciated.
Can we do something like:
x.column_default in ("0", "0:::INT8")
Maybe with a comment next to it saying that 0:::INT8
is for CockroachDB.
Very elegant. Thank you!
Also I keep bumping up into ALTER COLUMN TYPE is not supported inside a transaction
https://github.com/cockroachdb/cockroach/issues/49351
Apparently there's work being done on this at Cockroach HQ but is there any way to turn off transactions for these ALTER COLUMN TYPE
operations just for Cockroach in the short term? Thoughts?
ALTER COLUMN TYPE
is generally something one wants to avoid in CockroachDB anyway, as it's terrible for performance across a cluster, so perhaps disabling those for Cockroach is the way to go.