pulumi-cloud icon indicating copy to clipboard operation
pulumi-cloud copied to clipboard

Add a SQL database abstraction

Open joeduffy opened this issue 7 years ago • 5 comments

We currently support NoSQL database tables by way of the cloud.Table resource. We've already gotten customer requests for SQL tables, however, which is entirely reasonable, and I predict will be a common request, especially for Enterprise developers. It seems we could pick a specific database like MySQL that we know can be run in all clouds and on-premises, and then just bless that.

Let's call this cloud.Database:

let db = new cloud.Database({
    customers: { ... },
    orders: { ... },
});
something.subscribe(() => {
    for (let row of await db.query("SELECT * FROM customers WHERE p")) {
        process(row);
    }
});

Clearly there are many hard problems locked away behind the notion of customers...orders....

Perhaps the biggest challenge is what to do with schema and schema updates. We already have this challenge, tracked by https://github.com/pulumi/pulumi-cloud/issues/43, although arguably with SQL it gets much more complex. The NoSQL tables we offer do have some "schema" in the form of indices, but it's far less rigorous a notion, and developers tend to write code to be resilient in the face of schema deltas with NoSQL, whereas with SQL stuff just starts to break.

We have discussed the idea of "sequenced deployment actions" in the past, which is similar to how Google manages such problems during deployments internally. In conjunction with a first class way of managing schema in a way that Pulumi can "see" -- so it can do diffs and apply deltas -- this could be reasonable. But there definitely be dragons there and this deserves much more think-time.

joeduffy avatar Sep 22 '17 05:09 joeduffy

@mmdriley points out this interesting discussion over in the Terraform repo:

https://github.com/hashicorp/terraform/issues/10740

Definitely lots of good conversation in there. Interestingly, I've pondered the schema-as-hashed-resources ideas before too.

Sadly, I do think we are going to have to venture into this territory, ironically for precisely the same reason the Terraform team voted against doing so:

"Many applications are able to populate any tables, views, and stored procedures they need, but most expect a database to already have been provisioned by a separate process. Terraform is therefore attempting to accommodate the "infrastructure" side of this divide, while leaving the "application" side up to other software to solve."

Pulumi specifically blurs this line and tells the developer, "don't worry, we take care of stuff for you." I do wonder if there's a 90/10 solution here that works for most cases.

joeduffy avatar Sep 26 '17 15:09 joeduffy

Sadly, I do think we are going to have to venture into this territory, ironically for precisely the same reason the Terraform team voted against doing so:

Completely agreed - we'll need to do this.

In terms of abstraction - there's (at least) two options:

  1. Just MySQL - we take care of provisioning, but the end result is you get MySQL running on some port somewhere. You can then use any MySQL client library (e.g. https://github.com/mysqljs/mysql) to actually talk to the DB. We've abstracted the "cloud" bits, but not the "database" bits. We may offer provisioning level things like backups, etc. This is like Aurora or Google Cloud SQL.
  2. Our own DB abstraction - we wrap all interactions with the DB - abstracting away all details of the DB. You can't use a standard MySQL client library - you have to use our wrapped API. We would still have to document the SQL dialect we support for queries though - presumably MySQL or PostresSQL. One important benefit of this though is that we could do access control separately from just having a port at an address somewhere - that is, avoid the need for exposing a VPC-like concept). I'm not aware of any good prior art for this kind of solution. Google Cloud Spanner and CosmosDB have their own abstraction - but it's also a wildly different data store - so it has very different reasons for this.

At a high level, it feels like this is a place where we won't naturally want to wrap the abstraction too much, since (a) we can put MySQL everywhere, so we don't need to abstract it away (b) direct access to MySQL via existing client libraries lets us leverage dozens of man years of investment we would need to reinvent (c) we won't abstract the actual SQL syntax, so there will be a strong association with MySQL anyway via supported queries.

The only thing that gives me pause on that is the networking story. It's a very similar challenge that we have for containers - and I'll start a separate thread to explore how we approach our network sandbox for Pulumi Cloud applications. But assuming that we'll have to solve that somehow - I'm inclined to head in the direction of a SQL offering that exposes MySQL directly as in (1) above. Notably, that would lead to a different result than the sort of example in https://github.com/pulumi/pulumi-cloud/issues/63#issue-259709526.

Nailing this down first will I think help scope how we would think about topics like schema migration, backup/restore, etc.

lukehoban avatar Sep 26 '17 17:09 lukehoban

I completely agree that (1) is a more promising approach. The hardest part here is the DDL aspects, which I think we need to have an opinion on since it's so tied up with provisioning and deployments.

joeduffy avatar Sep 27 '17 01:09 joeduffy

I agree we should do option 1. Option 2 seems very painful and difficult to do correctly.

SQL is a case where you might want to mix pulumi/"cloud-vendor" and pulumi/cloud. For instance, on Azure you can run SQL Sever on a VM or as a managed service. We should make sure it's easy to use the vendor specific implementations.

lindydonna avatar Oct 31 '17 02:10 lindydonna

I would love to use Pulumi to manage Schema diffing / migrations

If the Pulumi user could define SQL schema in nodejs in a declarative way it would be hugely valuable

bionicles avatar Jan 27 '20 20:01 bionicles