django-cockroachdb icon indicating copy to clipboard operation
django-cockroachdb copied to clipboard

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ?

Open gnat opened this issue 2 years ago • 5 comments

This is about CRDB's hash sharded index feature for sequential indexes / primary keys: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/

Is there any way to use this feature?

I've tried passing various arguments to _create_index_sql():

condition= include= db_tablespace= expressions=

With no success.

It may require an extra entry in https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/schema.py to handle this use case?

Any comments, thoughts, appreciated.

gnat avatar Aug 19 '22 02:08 gnat

I wonder if https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/indexes/#hashindex would work, at least for the secondary indexes.

ajwerner avatar Aug 19 '22 04:08 ajwerner

Yeah, interesting find @ajwerner HashIndex won't work for us here, but that may be the direction we need to go in. New HashShardedIndex type, perhaps.

I've noticed any modifications I do to add_index seems to be for migrations only, and won't apply during model creation time, so that may be a dead end.

gnat avatar Aug 19 '22 05:08 gnat

Ideal situation is this becomes a Primary Key option- the most common use case.

Although I'd be okay with this just being default as discussed in the ticket over on the main cockroach repo.. The default here is already DEFAULT unique_rowid(). Regardless, this is a pretty important feature.

gnat avatar Aug 19 '22 05:08 gnat

Temporary solution that works:

#BigAutoField='DEFAULT unique_rowid()',
BigAutoField='USING HASH DEFAULT unique_rowid()',

@timgraham This will use the default hash bucket count of 16, which is a nice, sane default.

I'm too new to Django to dive into more advanced topics such as new Field types (if it is necessary?), and am hoping you can take this the rest of the way.

gnat avatar Aug 19 '22 05:08 gnat

Yes, a new HashShardedIndex class (with bucket_count an optional parameter) is doable. Usage might look like:

class Customer(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        indexes = [
            HashShardedIndex(fields=['id']),
        ]

The downsides are:

  1. It's a boilerplate to have to add to every model.
  2. Models in django.contrib and third-party Django packages can't be adjusted.

timgraham avatar Aug 25 '22 22:08 timgraham