django-cockroachdb
django-cockroachdb copied to clipboard
Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ?
This is about CRDB's hash sharded index feature for sequential indexes / primary keys: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/
Is there any way to use this feature?
I've tried passing various arguments to _create_index_sql()
:
condition=
include=
db_tablespace=
expressions=
With no success.
It may require an extra entry in https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/schema.py to handle this use case?
Any comments, thoughts, appreciated.
I wonder if https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/indexes/#hashindex would work, at least for the secondary indexes.
Yeah, interesting find @ajwerner HashIndex
won't work for us here, but that may be the direction we need to go in. New HashShardedIndex
type, perhaps.
I've noticed any modifications I do to add_index
seems to be for migrations only, and won't apply during model creation time, so that may be a dead end.
Ideal situation is this becomes a Primary Key option- the most common use case.
Although I'd be okay with this just being default as discussed in the ticket over on the main cockroach repo.. The default here is already DEFAULT unique_rowid()
. Regardless, this is a pretty important feature.
Temporary solution that works:
#BigAutoField='DEFAULT unique_rowid()',
BigAutoField='USING HASH DEFAULT unique_rowid()',
@timgraham This will use the default hash bucket count of 16, which is a nice, sane default.
I'm too new to Django to dive into more advanced topics such as new Field types (if it is necessary?), and am hoping you can take this the rest of the way.
Yes, a new HashShardedIndex
class (with bucket_count
an optional parameter) is doable. Usage might look like:
class Customer(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
class Meta:
indexes = [
HashShardedIndex(fields=['id']),
]
The downsides are:
- It's a boilerplate to have to add to every model.
- Models in
django.contrib
and third-party Django packages can't be adjusted.