electric icon indicating copy to clipboard operation
electric copied to clipboard

Add multi-database support

Open KyleAMathews opened this issue 1 year ago • 5 comments

I.e. each Electric server can be serving arbitrary number of Postgres databases (though we'll probably want to have a default limit).

Background

There are many use cases for multi-tenancy. A self-hosted Electric server might have a staging and prod database that it fronts. A company might have multiple production databases they sync from. Electric Cloud will need this to efficiently host large numbers of customers.

Proposal

Right now we start the server with a DATABASE_URL environment variable to set the backend. This will still be supported but we'll create an admin API for CRUDing backing databases.

Each database needs a Postgres connection string and an id. If an ID is not included, a UUID will be generated for the id. E.g. the original DATABASE_URL would have a UUID generated for it.

Managing databases

You can start the database with an ADMIN_TOKEN to authenticate admin API calls. If one is provided, API calls must include it in the Authorization header.

To create an instance:

curl -X POST http://localhost:3000/admin/database \
-H "Content-Type: application/json" \
-d '{
  "UUID": "...",
  "DATABASE_URL": "postgresql://username:password@host:port/database"
}' \
-H "Accept: application/json"

UUID again is optional.

Can we complete setting up the new backend db within the span of the request? Ideally we just hold the connection until it's setup. If there's a timeout, the caller should be able to GET the instance to see the status (or follow the shape for database updates).

To read an instance:

curl -X GET "http://localhost:3000/admin/database/[UUID]"

To delete an instance:

curl -X DELETE "http://localhost:3000/admin/database/[UUID]"

Can we complete cleaning up the deleted backend db within the span of the request?

Shape API change

Right now the path is /v1/shape/${baseTable} with query parameters.

With multi-tenancy, it'll be /v1/shape/${baseTable}?database_id=${id}. So this is a non-breaking change. To further make it non-breaking, if an Electric server only has one database instance, then database_id isn't required (though allowed).

We'll eventually want per-tenant controls e.g. max shapes, max shape log size, etc. that'll probably come along with https://github.com/electric-sql/electric/issues/1529

Logs & telemetry

We'll need to add the database_id to the start of logs and as attributes on spans. This would also be a good time to add a JSON log option for easier parsing.

Internal state shape

Not required for this issue but eventuallyl we'll want to expose admin shapes for monitoring various parts of the internal state. That way you can build monitoring/admin tools on top of standard Electric primitives and see live updates on what's happening.

KyleAMathews avatar Aug 28 '24 21:08 KyleAMathews

We'll need to add the db_id to the start of logs. This would also be a good time to add a JSON log option for easier parsing.

Another option is to stream logs to a different "log sink" for each database and thus keep their logs separate at the cloud provider level.

alco avatar Sep 04 '24 23:09 alco

Each database needs a Postgres connection string and an id. If an ID is not included, a UUID will be generated for the id.

Every PG DB has a unique ID that we’re already reading to detect when Electric is restarted with a different PG DB. We could use that ID instead of generating a UUID per URL?

  • pro: we can detect when different URLs point to the same PG DB
  • con: potential clashes in PG IDs as they are derived from the system’s time at creation


More information about that unique ID can be found at https://pgpedia.info/d/database-system-identifier.html

[...] the caller should be able to GET the instance to see the status

To read an instance:

curl -X GET "http://localhost:3000/admin/database/[UUID]"

What should the different statuses be? A proposal:

  • 202 Accepted: Electric has received the POST request to serve the DB but is still connecting to it/setting it up.
  • 200 OK: Electric is serving this PG DB.
  • 404 Not found: Electric does not know this DB ID.

Not required for this issue but eventually we'll want to expose admin shapes for monitoring various parts of the internal state. That way you can build monitoring/admin tools on top of standard Electric primitives and see live updates on what's happening.

This implies we will have our own Electric Postgres instance that keeps track of the different DBs Electric is managing such that we can define a shape over that.

  • con: have to setup and run an additional PG DB for this. Can’t piggyback this on a user’s DB because that DB may be deleted by an admin.

kevin-dp avatar Oct 02 '24 07:10 kevin-dp

We'll need to add the db_id to the start of logs. This would also be a good time to add a JSON log option for easier parsing.

Another option is to stream logs to a different "log sink" for each database and thus keep their logs separate at the cloud provider level.

Great idea. We could have a complete Electric log that contains the logs of all of them (prefixed with DB ID) and allow admins to configure the DB with a certain log sink on creation such that they get a dedicated log for that DB, e.g. streamed to S3.

kevin-dp avatar Oct 02 '24 07:10 kevin-dp

Every PG DB has a unique ID that we’re already reading to detect when Electric is restarted with a different PG DB. We could use that ID instead of generating a UUID per URL?

I think the downside of this is that then a postgres instance could only have one electric instance. There's a number of reasons you might to have multiple electric instances / postgres e.g. testing something locally against a prod db or load balancing shape requests across multiple electric instances. UUID is more flexible.

Status codes look good 👍

This implies we will have our own Electric Postgres instance that keeps track of the different DBs Electric is managing such that we can define a shape over that.

This is just a throwaway idea so not part of the issue but my thinking is just that we write changes to internal structures to a shape log. So we don't need an actual postgres instance backing these — we just emulate it for internal data.

KyleAMathews avatar Oct 02 '24 16:10 KyleAMathews

Great idea. We could have a complete Electric log that contains the logs of all of them (prefixed with DB ID) and allow admins to configure the DB with a certain log sink on creation such that they get a dedicated log for that DB, e.g. streamed to S3.

Splitting and streaming logs to various sinks is generally a separate concern. So another tool reads the JSON logs and does whatever with them. With Electric Cloud we'll definitely want to expose people's logs to them in a dashboard and let them stream them elsewhere. So we'd capture all the logs and then split them into different streams. We don't need to change anything though for this PR other than add the database_id in the log output.

KyleAMathews avatar Oct 02 '24 16:10 KyleAMathews

@KyleAMathews Not sure why using PG's DB identifier would limit us to a single Electric instance per PG? That seems like a limitation of the implementation that can be avoided. What i don't like about UUIDs is that several Electric instances for the same PG DB will use different UUIDs in each instance, for the same DB. Or even, a single Electric instance could end up with several tenants for the same DB if the admin adds it several times (e.g. through different URLs pointing to the same DB) and we're not checking the PG DB identifiers.

kevin-dp avatar Oct 07 '24 06:10 kevin-dp

Not sure why using PG's DB identifier would limit us to a single Electric instance per PG?

This is mainly for the Cloud case — at the edge, all we have is the database id so we need to know which Electric instance to route it to.

a single Electric instance could end up with several tenants for the same DB if the admin adds it several times

Yeah — we should only allow a db to be added once to a specific Electric instance.

KyleAMathews avatar Oct 07 '24 13:10 KyleAMathews