marten icon indicating copy to clipboard operation
marten copied to clipboard

Primary key indexes in multitenant dbs are (probably) ordered incorrectly

Open elexisvenator opened this issue 2 years ago • 9 comments

Currently when multi tenancy is enabled for a database, indexes are created with the columns arranged with id first, tenant_id next, anything else after. As an example see the sql below:

CREATE UNIQUE INDEX pkey_mt_streams_id_tenant_id ON my_schema.mt_streams (id, tenant_id);
CREATE UNIQUE INDEX pk_mt_events_stream_and_version ON my_schema.mt_events (stream_id, tenant_id, version);

CREATE UNIQUE INDEX pkey_mt_doc_deadletterevent_id_tenant_id ON my_schema.mt_doc_deadletterevent (id, tenant_id);

-- single stream and multi stream aggregates
CREATE UNIQUE INDEX pkey_mt_doc_customprojection_id_tenant_id ON my_schema.mt_doc_customprojection (id, tenant_id);

This works for general event usage, and if this is the only use case then it is probably good - putting the high cardinality column first is (sometimes) ever so slightly more performant. Where this causes problems is running queries to manage the tenant. Tenant-specific queries range from cross-stream queries (statistics, analytics), BAU tasks (redactions, data cleanup, restreaming) and the really big one which is offboarding tenants. Without these indexes starting by tenant the only options are to either do slow table scans or creating a duplicate of these indexes on every table in marten.

This is already an issue I have encountered using marten and for now the volumes of data we have is ok to table scan, but it very quickly wont be.

Maybe this could be a 6.0 breaking change? Worth discussing if its something to be concerned about or if its a matter of making the additional index creation documented (which i can help with).

Another thought is if support for partitioned tables will one day be added, the indexes would need to be flipped for that anyway.

elexisvenator avatar Dec 12 '22 05:12 elexisvenator

Let's knock this out in 7, because, yes, it's a breaking change. Does make sense though.

jeremydmiller avatar Sep 09 '23 21:09 jeremydmiller

So a few thoughts on this finally:

  • I definitely don't think you want to reverse the order for normal usage because of how often the app in normal issue would be querying against the stream id
  • Even if and when we add sharding, you won't be able to shard by tenant because of the high water detection. I think any sharding will have to happen by archived / not archived and maybe by sequence number, so sharding likely won't help for tenanting
  • I think that a database per tenant model might be advantageous for these once in awhile needs
  • I'd be more inclined to add additional indexes on just tenant id

jeremydmiller avatar Dec 04 '23 18:12 jeremydmiller

Saw this yesterday and agree with your arguments.

Since then my only additional thought is do you want to keep the high water detection working the way it is now? If high water marks were moved to be per-tenant then that would ensure more consistency between how the different tenancy types work (conjoined and multi-db). It would come with the downside of cross-tenant projections not working anymore, but that is sort of already the case with multi-db tenancy.

^acknowledge that this is a really big change due to how sequence ids are currently used.

elexisvenator avatar Dec 06 '23 04:12 elexisvenator

@elexisvenator I just re-read the thing and I'm back to your original point. If conjoined tenancy is on, we're always going to be querying by both the tenant and the stream id even if you're querying by the stream id. So put this back on for v7.

jeremydmiller avatar Jan 03 '24 00:01 jeremydmiller

@elexisvenator And to your question about the high water detection, we're never touching that unless there's a very good reason to do so because life is short. You can't do it per-tenant anyway because some projections will cross tenants.

There's a separate event sequence for each database at least

jeremydmiller avatar Jan 03 '24 00:01 jeremydmiller

I don't think there's a single place in the code today that depends on the column ordering in the table, but we're about to find out! I'm moving the tenant_id column first in every table where it appears.

jeremydmiller avatar Jan 03 '24 00:01 jeremydmiller

I think you are right, I tested switching the index ordering around a while back and did not notice any issues pop up in the test suites. The real catch is migrating the indexes for existing tables.

elexisvenator avatar Jan 03 '24 00:01 elexisvenator

(Moving the tenant column on the table will make no difference, its only the index order that matters)

elexisvenator avatar Jan 03 '24 00:01 elexisvenator

The index delta detection is good enough to get that, but I might ask for some folks to test that manually just because.

jeremydmiller avatar Jan 03 '24 00:01 jeremydmiller