trails icon indicating copy to clipboard operation
trails copied to clipboard

multi-tenancy database support

Open dennisdupont opened this issue 9 years ago • 7 comments

One of the reasons we did not move forward with sails was requirement for supporting multi-tenancy. Since that means different things to different developers, what I mean is using a different database connections for each client, which of course is based on the user logged into that session. Since I see lots of responses to this about sharding and solutions to just put them in the same database, I will also say that we have this architecture for valid reasons and differing solutions for differing reasons are not what I am looking for.

In .NET I have implemented a connection pooling scheme that pools based only on target server and credentials. As long as the credentials or server does not change, a transaction simply uses the correct database prior to executing. While this is no applicable to all database types, it works well enough for most traditional relational servers. Something similar here would rekindle our interest in another Rails-on-node project.

dennisdupont avatar Dec 18 '15 20:12 dennisdupont

While this is no applicable to all database types, it works well enough for most traditional relational servers

One of the reasons this was so hard in Sails is because of the tight coupling between Sails and Waterline. It was really difficult to insert logic between them to govern a multi-tenant situation. This was fairly widely-discussed in the Sails projects:

  • https://github.com/balderdashy/sails/issues/1857
  • https://github.com/balderdashy/sails/pull/2553
  • https://github.com/balderdashy/waterline/pull/787
  • https://github.com/balderdashy/waterline/issues/286

@Glavin001 was the closest to having this working, and I think he partially succeeded, but we weren't able to merge his changes in because Sails/Waterline were not sufficiently modular. We'd have to add all this stuff to the core and maintain it forever alongside the other primary features.

In Trails, it is much easier to build Trailpacks that can alter any part of the framework. Trails relies on no specific web server or orm layer -- everything is configurable and swappable. The main challenge I think will be developing an API that everyone is comfortable with :) I'm happy to discuss that here.

tjwebb avatar Dec 19 '15 00:12 tjwebb

So theoretically Trails itself is the ORM? I don't see much of a point of an ORM if you can swap the modules easier. The only benefit would be if you need to switch databases in the middle of developing an app (which probably isn't a good idea unless you have well-done tests).

Would an ORM Trailpack implement an ORM API into Trails that makes functions available on its own or implement into Trails like Waterline does right now with Sails (you don't touch it directly very often)? If the latter, then I really don't see the purpose of an ORM, as Trails really is the ORM itself.

DaAwesomeP avatar Dec 24 '15 02:12 DaAwesomeP

Yuppppppy.... +1

Waiting for this feature. Any milestone or deadline created ? I am going to start a new multi-tenancy app, But i am waiting for this support. I also tried to help in sails, but it was not possible without changing lots of code in waterline so they are not interested right now.

Nishchit14 avatar Apr 05 '16 05:04 Nishchit14

@tjwebb I just take a look at what @Glavin001 does and it's a great work ! Something similar can be use for Trails. Here is my thought :

We can create a TrailsModel that use GraphQL definition to harmonize models definition (like we talk for https://github.com/trailsjs/trails/issues/189 and https://github.com/trailsjs/trails/issues/94). This model have this methods :

  • Create
  • Find
  • FindOne
  • FindOrCreate
  • Update
  • Destroy
  • Count

Each methods return a promise. And like Glavin does we can have something like :

 Model
        .tenant("myTenantId")
        .find({})
        .include('childField') // or populate
        .then( data => {
          console.log(data)
        })

Under the hood each methods will call the default FootprintService with all params to use the chosen ORM, and each trailpack-datastore can implement the tenant API to support it or not.

We already need standardized models for Trailpack (to avoid them to be compatible with one ORM only) so why not make the work once and fix all at once too :). With this Trailpack have unified models definition that compatible with any ORM and also use tenancy ! And if users don't want use tenancy under there project they always can use the native model definition of the ORM they choose (like it work actually).

Hope I'm clear :/ if not let me know :)

Any feedbacks ?

jaumard avatar May 21 '16 11:05 jaumard

Model.tenant("myTenantId")

Sure, this is reasonable. Keep in mind, though, that routing the request once you have a tenantId is less than half of the problem. What if you need to route the request to a tenant based on more complex criteria? What if part of the graph straddles two tenants, and you need to set up permissioning for this? For example, these are some scenarios that I've seen in the past that a solution would need to support, or at least be aware of:

  1. Each "Company", which has many users, is a Tenant. Users are allowed to perform operations within their company, but can also share documents/etc with Users in others Companies. When a request is made, in order to route it to the correct tenant, some non-trivial logic is involved
    • figure out which Company/Tenant the User belongs to
    • figure out whether multiple Tenants may need to be queried for shared entities
    • figure out in each Tenant whether the request is allowed
  2. There are a number of "Regions", each of which is represented by a geographical boundary, and a number of Points in each Region. In order to query for a point, I first need to figure out which Region (Tenant) the Point belongs to, so the essence of the tenantId here is actually that of a 2-dimensional Convex Hull -- similar to the previous example, each Region might define its own permissioning rules for what kinds of requests are allowed.

All of this also needs to be reconciled with the infrastructure/network topology of the system on which it's deployed. Tenants might need to be stored in separate databases, or tables, to satisfy performance requirements.

Thanks @dennisdupont, @jaumard, and @Nishchit14 for facilitating this discussion. Maybe @Glavin001 will pitch in as well :) Looking forward to seeing everyone's ideas.

tjwebb avatar May 21 '16 15:05 tjwebb

@tjwebb Good points, this definitely can be more complex than it seems at first glance. In fact we are working hard to support a legacy system that has similar complexities (although IMO it was over-engineered).

The original design defines a User as belonging to a Domain (terrible choice of naming, but generally persons from the same corporate entity). Programs are the various instances of the same application installation (program in this case as in "Program Management", not computer programs). A User could be granted access to one or more Programs as long as their Domain had access to the same. This is analogous to your concept of users sharing docs. However, the User must select the program in order to "enter" that shared area. Thus our unit of tenancy is the Program and we have the currently selected Program ID in the session that drives the database selection.

But under all of this we still had to consult one or two "system databases" from time to time. This was where all of these User/Domain/Program relationships and associated access rights were stored (how else would you get the list of Programs a user could select?). Same with personalization, menus, web page metadata, etc. I think we would want these system databases to have their own models and not really be affected by the tenancy. This leads to needing to accept multiple simultaneous connections to different databases/servers.

To simplify your point on determining the target, I think the tenant reconciliation logic must be separated from the Model layer. You could make this a programmer's problem (i.e. determine the tenant ID and pass it to the model on each call); you could have some callback or hook approach; or you could have some context-switching design that changes the underlying database for a set of grouped models whenever the user moves from one tenancy to another (this would be my preference). This would add a new collection object type for grouping models that share an underlying connection. I like the context-switching because I think it follows the use case (a person is literally performing this switch).

As I said in the original post, much of this is due to our legacy ties to relational databases. If I had a clean whiteboard would a document store or graph database have the same issues? Maybe a simple sharding solution would solve all of this, but then we really wouldn't be discussing an ORM then :)

dennisdupont avatar May 21 '16 18:05 dennisdupont

@daawesomep, it makes the functions available. Trails is not an orm.

clayrisser avatar Jun 28 '17 03:06 clayrisser