trafficstars

DRILL-7871: StoragePluginStore instances for different users

Description

This feature allows to have the separate plugin set per user. Profiles are restricted per user too.

Documentation

To enable the feature: drill.exec.security.user.auth.separate_workspace

See doc fr more info

Testing

Unit tests. Manual testing of Drill WebUI and JDBC connection via SqlLine. Details in doc

Jun 04 '21 14:06 vdiravka

@vdiravka, looks like the test you have added is failing in CI, could you please take a look?

Could you please share some functional-level doc with the description use cases and behavior, so it will be easier to do the code review?

Jun 23 '21 13:06 vvysotskyi

This one is difficult. We are reviewing code under time pressure when the design itself raises many questions. Let's start with the design. First, it is too short, it leaves many issues unspecified. So, I have to infer them from reviewing the code. As noted before, the storage plugin registry is complex; so having to reverse-engineer a design at the code level makes the process slower than it might otherwise be.

This PR adds two separate features: a new options level and changes to the plugin registry. Since these are independent concepts, they should appear as separate PRs: one for the options, another for plugins. That will reduce the cognitive load on us poor reviewers.

So, the design of options:

But it appears different users have the common Storage Plugins and System Options.

There is no possibility for Drillbit configuring individually for every user and persistent between sessions (Note: SessionOptions are removed after closing session)

The above is by design. System options are just that: options for the system. They are meant to be set by the administrator to control the Drill cluster as a whole. Drill also has Session options: these are per user session. If we want per-user option settings, then we need a new level: User options.

Options form a hierarchy. At present that hierarchy, from most to least specific, is:

Per-query options or session options
System options
Options configured in the Drill config system.

It sounds like you want to add a new layer: user options:

Per-query options or
- Session option
- Per-user option
System options...

Suppose we have per-user system options: I set my queue length to 10 items, you set your queue length to 100 items. But, the queue is a global resource: which is used? This is just one example of how intractable "per-user system options" is as a concept. So, we need actual user options.

One obvious choice is simply to persist the session options. I login and change foo to 10. Drill simply persists that change so that next time I log in, foo is 10. Easy, but is that the right answer? Drill has a serious flaw that the behavior of any query depends on options: JSON modes, all-text mode and so on. That was done as a quick and dirty short-term solution to ship the product, but is still with us and still causing issues. If I set an option to "all text mode", and run my query, I get a result. If I share that query with you, but you don't set the all-text option, the query will produce different results (or, probably, fail.)

So, to "fix" options, we need to push those options into the query (with hints, say), or we need options associated with a query, as having per-user options won't solve the "options associated with a query" problem.

So, we probably do not want to just persist session options as they are used to control specific queries. Instead, we'd want to add a new system: SET USER OPTION ... so that the hierarchy is as above.

The design is silent on many important questions:

What is the new hierarchy?
How does the (not an) administrator set truly-global options if all users can set their own?
How does the user set, review and clear the user options?
Suppose a user is logged in twice. In session 1 they change foo to 10. Do we expect session 2 (on a different Drillbit) to see that change? This is a very difficult concurrency question. Of course they do. But, when? In the middle of query planning? At the start of the next SQL statement? After a time period? Only at the next login? If concurrently, how will one Drillbit track changes made to the user options by another Drillbit? How can the use a "force resync" to keep options synchronized?
Can the admin see the per-user options? If so, how? SHOW OPTIONS FOR USER 'bob'? Remember, the admin is tasked with policing the system, they must be able to see all options.
How do we solve the current ambiguity of true system options (the query queuing options mentioned above) vs. user/session options? At present, the solution is ad-hoc and we just trust that most options are not changed. If a naive user can change any option, we've got issues. Do we need a permissions table to say which users can change which options? (And, categorize true system options as read-only at the user and session levels.)
You will need USER version of all of the option commands.
You will need complete tests that demonstrate that the hierarchy works as expected.

The good news is that, in terms of implementation, Drill's option system is already hierarchical, so adding a new layer should be very easy.

All of this is complex. It should be done as its own PR along with a detailed design.

Jul 11 '21 22:07 paul-rogers

Let's now move onto the primary topic of this PR: plugins. Again, this is a complex topic, especially if we think about how the solution would work at scale. I'm afraid that, since the design is so sparse and leaves out many details, that we're perhaps going down the wrong path. Let's take each topic in turn so we can see what's what.

First, what is the structure we want? To answer this we need to separate the code (Java libraries, which I'll call a "connector", following Presto) and configs (the JSON things that users see, what we normally mean when we say "plugin".)

Drill will only ever work with a fixed set of connectors: those available on the class path at runtime. AFAIK, there has never been interest in adding connectors at runtime (no "dynamic connectors") and, for security reasons, I'm not sure that doing so would even be a good idea.

Connectors are presently used by any number of configs. And, every query creates an instance of the objects in the connector (the readers, the planner objects, etc.) So, the structure we have is:

Config (a JSON object in the persistent store)
Connector (a library)
- Plugin (an instance of the plugin class from a connector, along with its config.)
  - Readers, planner objects, created per-query from a plugin

The plugin registry binds connectors to configs to produce plugins (where by "plugin" I mean an instance of the plugin class initialized with a config.) This is some mighty complex code! It has to deal with a bunch of distributed system and concurrency-related issues. I'll omit the details for now.

So, what does this mean? For one, it means we only need one copy of the connector. The "new" plugin registry is designed to handle this idea. My "class path plugin" and your "class path plugin" use the same plugin library. I'll assume that the answer above, that each user has their own copy of the code, represents a misunderstanding because of Drill's horribly ambiguous names in this area.

Next, the plugin registry holds two kinds of "plugins". First are those configured in the UI and that can be resolved in queries. Familiar things like cp, dfs and so on. Second are ad-hoc plugins: those created on-the-fly based on table properties in a query, or plugin properties passed along in a query definition. This is done so that queries work even if, right after planning completes, someone deletes or changes the stored config: the next query sees the new config, the executing query uses the config stored in the query. (Else, disaster would result when some Drillbits see one config, others another.)

There is a system plugin that offers the system tables. This one is meant to be shared by all users. It has no config options, but if it did, they would be set at the system level by the admin. It makes no sense for each user to have their own copy. (It might make sense to disallow certain system tables for some users, but that is a different question.)

Now we have the "regular" configured plugins: a JSON config and an associated plugin instance. The goal here is to define a new level so that individual users have these items.

First, I'll question if the requirements are correct. Do we really want plugins associated with each user? If you and I both use tale "foo", do I have to ask you to send me your JSON so I can create a new copy? What if you change something? What if we have 100s of users all with copies? Making copies is a "demo only" feature, it does not scale in production. Remember: reuse by copy and paste is for amateurs who will throw away their solution, not for professionals who want to minimize total costs.

Let's think of a use case. A firm has five departments with 10 people per department. Some people in one department can see some data in another. For example, the VP of marketing is allowed to see Sales and Dev data. The CFO can see everything. Oh, and the employees change regularly.

This is not a new idea. How do other tools handle this? Oracle? SQLServer? They operate at the level of schemas (databases) and tables. So, think of a plugin (config in storage, config + connector at runtime) as a shared, named object.

So, what we want are groups of plugins. "Marketing", "Sales", "Ops". Within each are the configs for that set of tables. Users can be given permissions on the whole group, or on individual plugins. Administrators or DBAs (those who understand production systems) set up the configs. Users use those to which they are given permission. Or, if we want to be cheap, we have a huge global name space of plugins, and require all names to be globally unique, but you can only see those to which you have been given permission. (How will this work with 100 plugins and 1000 users? That results in 10,000 distinct permissions to maintain.)

Users might want to create their own, individual configs not meant to be shared. Fine. Have a "user" group that holds configs for just that user. If I want to promote my new "foo" config, I ask an Admin to add it to the "Dev" group so you can see it.

If the above is roughly in the ball park, we'd want to work out the details in a detailed design. There are likely a dozen "gotchas" that the above omitted. Who removes configs for users no longer with the organization? What connectors is a user allowed to create configs for? How can the admin see all configs to police users or track down odd behavior "some user did something silly that puts huge load on an external system"?

Think of security: who decides who gets permission on what? Is it done manually? With what UI? How does that work in an organization of 10 people? 100? 1000? Since we are dealing with permissions based on roles (role-based security), can/should we tie onto the corporate LDAP or similar system? What is the API for that?

Configs should hold security keys, but those should not be in JSON, stored in the persistent store. (Drill has always done this wrong for systems other than MFS; another reason Drill fails in production.) How are configs connected with a "vault" to get keys, passwords, tickets and the like at runtime? On S3, the use of temporary tokens is the right way to do permissions, no one gives away S3 keys any more. So, how does a plugin obtain and refresh its S3 token?

These are hard topics. Thanks for taking on this big challenge! Suggestion: pick a small subset to start with, but ensure that subset allows us to add the pieces needed when the solution moves into production.

Jul 12 '21 00:07 paul-rogers

Hi @paul-rogers,

I started design for system options from adding the new additional layer. But this approach is much more complex and we should cover all the cases you described. The design for the feature proposed by me is other: we just don't use global system options at all (in case separate_workspace is enabled). Every user has it's own system options. It gives a great functionality, which drill required for a long time, I think. We used session options for this purpose, but it can't be serialized. I also abandon the idea to Store Session options for the same reason, it has other concept and couples with a Query.

About plugins the same thing, I tried to keep things as simple as it is possible (it corresponds to your last sentence).

I agree we can factor out system plugins and use that shared plugins for all users. It is minor improvement, it allows to keep less things in memory. This is not a blocker for this PR. I will create a ticket and proceed working on it after finishing this PR.
Also I agree with it is nice to have Group/Organization level of plugins and System Options. I noted about it in Design document. But it is much more complex improvement, requires introducing a lot of new rules and grants. I thought to consider working on it after finishing on this PR. Because now separate plugins and options can provide all things needed for users, but setting up is not so comfortable as it can be with Group/Organization level of these instances. And there are more similar instances are placed in memory now (separate instances for Groups/Organizations will improve this too)
About concurrency and distributed systems issues, they are all the same as it is now. Since event now one user can run query and other can edit the plugin config. Now the picture is even better, since two user can't sue common plugins registry or system options. Whey their own ones. They don't impact each other.

About dividing the PR I also thought about it initially, because leverage of System Options is other than plugins, so a lot of things for options is other. But options and plugins are combined with a common idea - to have the separate workspace for users: plugin configs, system options and to see their own query profiles only. And the approach for implementation is very similar - to replace the global instances with their own user-session ones. So I can here to divide the PR in two parts or just in two commits for easier review. But I think we can also go with one PR and commit.

So let me know do you like the above approach for this feature with a two (or more) further improvements or possibly you want to bring mail discussion on this topic with other devs?

Jul 14 '21 00:07 vdiravka

Hi @vdiravka,

Thanks for the note, you make good points. Let's see where we can compromise. Let's start with options.

It makes no sense for users to have their own system options (my favorite example is the length of the global, shared query queue.) So, SYSTEM options can't be per-user. This means that the SYSTEM level has to remain as it is. We can cut corners: still allow users to change system options, we just ignore those values. (For example, IIRC, the query queue, the value is taken from the SYSTEM options; we ignore SESSION values.) That is, we just trust that user ignorance will prevent them from breaking things; they won't know about the system options.

If we have any form of user-property persistence, then we have to worry about race conditions. I don't see how we can ignore that issue unless we are strict and we say that user options are read once: when the user starts a session. If I change all-text-mode in session 1, it will not affect session 2. This seems reasonable for the "toy" use cases where Drill is now used.

That is, when a session starts, we set the session options to the saved per-user values from the persistent store. A nice result is that, at runtime, there is no difference between user and session options. This fact should save some work.

The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: ALTER USER SET <name>=<value>. This would do two things:

Write the given key/value pair into the persistent store for the user.
Change the session option in the current session to that new value.

Again, we let the user change anything they want; we rely on the fact that users will only care about one or two options, and only when someone suggests a change.

The above is simple, but limited. The limits are no worse than today. It seems we could extend the feature in the future without breaking anything. (For example, broadcast user option changes to all sessions. Apply security to system options. Etc.)

Then, we simply ignore the issue that the options users care about are associated with queries (all the Parquet, JSON, and other options that affect reader behavior.) We simply decide to be surprised when the user encounters those issues. User: "Gee, I set the all-text-mode option and my queries run file. But, when Bob runs them, they fail...". Our answer: "have Bob change the same option" This may break Bob's own queries, so we have Bob change the option back. This is a horrible user experience, but is the same as what Drill offers today, so we're not breaking anything.

Will this work?

Jul 16 '21 03:07 paul-rogers

Next, let's discuss plugins. Let's start with three goals:

System plugins work for everyone (we can add security later).
By default, plugins are available to everyone. That is, the Drill out-of-the-box plugins are available so that, say, cp works and people can try examples from the documentation.
There is an optional way to create per-user plugin (configs).

What is the simplest way to do this? Maybe by changing the plugin resolution rules. When the planner looks for a plugin, it tries two names: $$<user>$$<plugin> and <plugin>. We use the resulting name in the query plan. So, if I have (unwisely) created my own cp plugin, when I ask for cp, that's what I get: $$paul$$cp. If I do not have my own dfs, then when I ask for it, I get the (global) dfs. (The $$<user>$$ is something I just made up; it can be anything as long as the marker is not likely to ever appear in a real plugin name. Maybe ..<user>.. or \<user>\ would be better since they are not legal SQL.)

When a user stores a plugin, and the per-user config feature is enabled, then the UI appends the user prefix. The UI filters out all plugins owned by other users. When the admin works with plugins, there is no prefix. This allows the admin to see, and change, both cp and $$paul$$cp. If I leave the organization, the admin can remove all the $$paul$$... plugins.

I suspect you'll find that, with this solution, no change is needed in the plugin registry. Plans will resolve to the correct reader configs. The reader configs in the query plan will just work (as they would in your design.)

How does this help sharing? Simply ask the admin to change the name of the plugin from $$paul$$whizbang to just whizbang. Now, everyone can see it and my queries work without change. Super simple. We have only two security levels (everyone or single user), but that is far better than reuse by copy/paste. The design could be extended to the role-based one without breaking anything.

The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, which is, as we noted, a complex beast. Any change requires extensive unit tests.

By the way, when I revised the plugin registry a while back, I fixed dozens of bugs. Most were simply because I added tests and found all kinds of broken stuff. Concurrency was broken, upgrades were broken, expiring of old plugins was broken. and so on. I still have scars. So, I'm trying to save you from similar pain.

Will something like this work and avoid the hard work we discussed earlier?

Jul 16 '21 03:07 paul-rogers

One more thought: it is not clear from the PR or design document why you need these changes, so I'm having to guess. I may be guessing wrong. Can you spell out the use case?

For example, I could imagine that you need the persistent user options because you want to use the REST API, and that API does not have a persistent session, so options don't persist. As it turns out, someone kindly recently added the ability to pass along options on each query request. This is, in fact, how Impala works to avoid the need for session state. So, you can achieve the same result by keeping options on the client.

Or, you can achieve the same result by adding a session concept to the REST API. Log in and get a session token back. Send that token on subsequent requests to identify the user and to key to a cached session object. The session ends after a timeout or on an explicit logout call.

If the per-user plugins are to allow multi-tenant operation, then the problem is considerably more complex as much of Drill is not designed for that. (You'd want to secure query profiles, police resource usage, etc.) If it is to allow different users in the same organization to define their own plugins, then you have the sharing issue we discussed.

In short, if you can explain your goals, we can perhaps help you find a workable solution that achieves the goals. Without knowing the goals, then I'm free to make up my own understanding of what you want...

Jul 16 '21 05:07 paul-rogers

Motivation:

The goal is to allow multi-tenant operations. So it doesn't matter what API user uses: REST, JDBC ot ODBC. This is true that Drill initially wasn't designed for that, but we ca add it.

Options:

The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: ALTER USER > SET =. This would do two things: Write the given key/value pair into the persistent store for the user. Change the session option in the current session to that new value.

This is also questionable, because it breaks SessionOptionManager extends InMemoryOptionManager:

This is an OptionManager that holds options in memory rather than in a persistent store. Options stored in 
SessionOptionManager, QueryOptionManager, and FragmentOptionManager are held in memory 
(see options) whereas the SystemOptionManager stores options in a persistent store.

Most likely all easy ways here are some sort of hacks. So let's do that in correct manner: adding the new UserOptionManager as a layer between SystemOptionManager and SessionOptionManager.

The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, > which is, as we noted, a complex beast. Any change requires extensive unit tests.

Different PS can be chosen per Drill doc. This is independent from Plugins. We have HBasePersistentStore, InMemoryStore, LocalPersistentStore, MongoPersistentStore, NoWriteLocalStore, ZookeeperPersistentStore implementations for PS. If you mean different instances of the samePS, that is not needed. PSRegistry is just a code, which allows to access actual PS. We can manage access via the common PSRegistry. Or we can reconsider it in future.

Plugins:

About plugins it makes sense to add Global plugins and to use it in case User plugin is absent. It's like a plugins fallback. And system plugins can be used only in scope of Global plugins. I don't think user name in schema is a nice and elegant solution. I think plugins fallback is simpler and more convenient approach.

Conclusion:

So I am going to add this functionality to current implementation and separate out user options part of this PR into the new one.

Jul 19 '21 21:07 vdiravka

@vdiravka, thanks for the explanation. Let's be careful not to confuse existing implementation with new requirements. Yes, there is a comment that says that the system options are persistent and session options are not. Be careful to read this correctly. At the time the comment was written, system options were the only persistent options. System options, by definition, have to be persistent. The comment did not forbid session options from being persistent; it simply observes that, at that time, for the reasons we discussed, session options were not persistent. Nor does the comment say that, if you add persistent user-level options, that they have to be system options because only system options are persistent: you are free to add persistence to session options.

Thanks splitting this PR into two: one for options, another for plugins. Else, discussion becomes overly complex.

Before we worry about code details, we have to get the semantics right. After writing the comment on plugins, below, I realize that we are probably using the same words for entirely different ideas. Your goal is to add multi-tenant support to Drill. You uses the term "user options". I now realize you probably mean "tenant options."

If we are doing per-user options (which is how I first read your description), then the requirements are one thing. If we want per-tenant options, the requirements are different. Let's list both scenarios: per-user options and per-tenant options. Please let us know which you are trying to build (or if you are tying to build something else entirely.)

Per-User Persistent Options

Suppose the goal is to let users (individual named humans) persist options. We assume that system-wide settings are set as system options. We are simply avoiding the need for the user to configure their own personal options on each login.

Users will want to persist only some of their session options. We discussed that this may not work as one might think: we have to think through the issues. The point was that concurrency of user options will be a mess if they work like system options: if changes to user/session options are written to persistent storage immediately. This will screw up queries as I persist "all-text-mode" in one session, which breaks a query in another. Point is, user options must have semantics different from system options.

We need a concurrency guarantee. One choice is "options are propagated to all active sessions with unknown latency, just like system options." Another, more dependable, is "session options are initialized on session startup, after which they do not reflect changes made in other sessions." Whatever it is, the definition has to be stable and explainable.
Some options apply to queries: we've been using "all-text-mode" as an example. Such options will be set frequently. Since these options are per-query, we cannot set the option across all user sessions immediately, as that will break queries. Instead, the user has to tell us when they want to persist that setting. Otherwise the setting is per-session (that is, per-query.)

Is it a bad thing that the behavior of queries depends on session options? Certainly. That was a poor design choice. But, it is how Drill works, and we have to design new features to reflect this choice.

Per-Tenant Options

Now, if the goal is to have "per-tenant" options, we have a much different problem. Each tenant is an organization which believes it has its own Drill cluster. That the cluster is shared is invisible to the tenant. The tenant reads in the docs that they can change system options, so they decide to do so. Those changes must persist only for that one tenant and all that tenant's users.

I now see that this is probably what you are trying to implement. As a result, the semantics are closer to system options. For now, let's assume that you do not, in fact, need per-user (per-named human) persistent options.

Let's list some possible requirements:

A bootstrap option enables multi-tenant support. The option is off by default, meaning that existing Drill behavior is unchanged by default.
A tenant is an organization with (a single user? multiple users?)
Each tenant has a tenant admin user who may set its own (subset of) system options. This subset is called tenant preferences. Such options persist, like session options, but apply only to sessions started for that tenant. Values set by one tenant cannot be seen by any other tenant.
Both server options and tenant preferences are system options: changes take effect immediately (after some latency) in all active sessions. Both are changed with the ALTER SESSION and related commands.
A new class of user, "tenant admin", is given permission to change tenant preferences. The "tenant admin" cannot change server options. When a tenant admin issues an ALTER SESSION command, that command alters the tenant preferences.
The full set of system options include options outside of the tenant preferences set. These are called server options and may only be set by a server admin user who belongs to the organization that administers the physical Drill cluster.
In multi-tenant mode, the Drill "admin" user becomes the server admin who can modify server options. The server admin can also view and change tenant preferences.When the server admin issues an ALTER SESSION command, that command alters the server options.
When not run in multi-tenant mode, server options and tenant preferences are identical, they are just system options. In multi-tenant mode, the concepts are distinct: each tenant has their own persistent set of tenant preferences separate from server options.
Normal users can change neither server options nor tenant preferences; such user can only change session options using the ALTER SESSION command.
The server options, tenant preferences, and session options form a hierarchy, as in present Drill. The server admin can alter any option which "flows down" to all other levels. The tenant admin can alter tenant preference which "flow down" to sessions.

Code wise, you will be adding a new system options layer. You should mark options as "server", "tenant" or "user" so the above requirements can be enforced.

Jul 23 '21 20:07 paul-rogers

@vdiravka, on the plugins side, I think the confusion over "user" and "tenant" has us talking about different designs. Again, let's discuss per-user and per-tenant designs. Tell me which you are trying to build.

Clarifying the desired tenant model may help with the options discussion. I've been equating "user" with "person". If your tenant model equates "user" with "tenant", you are actually trying to define per-tenant defaults. Perhaps update the docs to clearly say so. Let's use the term "tenant" for a business entity and "user" for a human, perhaps within that tenant.

Per-User Plugin Configs

The goal here is to allow users (humans) to create their own private plugin configs. Since the users work in a common organization, users may wish to share configs without copy/paste reuse.

Requirements for this case:

Provide a bootstrap option to enable the new semantics. The default is off, meaning that Drill behavior will not change by default.
Retain global plugin configurations. (They allow multiple users to share the same configuration without copy/paste.)
Add per-user plugin configurations.
Users can add and change only their own per-user configurations. (No user can "share" a config with another user: either the config must be copied to a global config by the admin, or the second user must make their own copy.)
If a user configuration happens to have the same name as a global configuration, the per-user configuration takes precedence. (If the admin make a global version of a now-shared config, the user must delete his own copy in order to use the now-shared version.)
No non-admin user can see or change the configuration for another user. (That is, if two people work in the same group, and want to share a config, they must do so via copy/paste, or by asking the admin to make the plugin global to all users.)
No non-admin user can change a global configuration.
The admin user can see, add, modify or delete any plugin configuration, both global and per-user. (Without this, the system is not maintainable as the admin won't be able to manage per-user configurations.)

Per-Tenant Plugin Configs

The other way to read your comments is that you want plugin configs per tenant (business entity). In this case, as we said earlier, Tenant A should not know that there is a Tenant B on the same cluster. There is likely no shared storage or configs between tenants. The only exception would be system-level configs offered by the organization that owns the Drill cluster, such as the system tables. Maybe some demo data.

Requirements in this case:

Provide a bootstrap option to enable multi-tenant support. The default is off, meaning that Drill behavior will not change by default.
A set of global plugin configuration is provided by, and can only be changed by, the server admin (as defined in the earlier note.) If no global plugins are to be provided, then delete all global plugin configs.
The system table is not controlled by a plugin config. If tenants are to be prohibited from using system tables, provide an option to disable system tables. (Or, a per a later note, ensure that system tables do not leak tenant information.)
Tenants see only their own configs. To the tenant, the per-tenant configs are the plugin configs.
UI and API operations that operate on configs, when run in multi-tenant mode, implicitly apply to the set only for that one tenant. That is, there is no new UI or API; the existing operations simply change meaning to apply to only the active tenant.
Every user (except the server admin) is associated with a tenant. If the user has permission to modify plugins, then all changes apply to only that one tenant. (Possible refinement: only the tenant admin, as defined earlier, can change tenant plugin configs.)
A tenant may (may not?) have multiple users. All tenant users see the same set of tenant plugins. There are no "per-user" plugins, only per-tenant plugins.
Tenants can add and change only their own per-tenant configurations. No tenant can "share" a config with another tenant.
If a tenant configuration happens to have the same name as a global configuration, the per-tenant configuration takes precedence.
No tenant can change a global configuration.
The server admin user can see, add, modify or delete any plugin configuration, both global and per-tenant. (Without this, the system is not maintainable as the admin won't be able to manage per-tenant configurations.)

Jul 23 '21 20:07 paul-rogers

@vdiravka, let's think about your larger goal. If the goal is a true multi-tenant model, in which tenants are distinct business entities (rather than different departments within a single entity), then we must also ensure each tenant has access to only their own query profiles. Will there be another PR for this? Is there a roll-up Jira ticket for all the issues involved in true multi-tenant operation?

Also, the Drill UI assumes a single organization. If the UI is to be exposed to multi-tenant users (so they can monitor queries, see query profiles, etc.), then the UI must change. Tenants should not be able to see details, or change the state of, Drillbits. Session options should reflect tenant values. Probably other changes. Will there be a design or PR for this?

Further, each tenant must have guarantees on resources. That is, Tenant A should not be able to run a huge query that denies resources sold to Tenant B. This is a very hard problem. If you don't solve the hard problem, the options and plugins are somewhat moot. The option and plugin features are handy, but do not, by themselves, give you multi-tenant support in Drill.

System tables could "leak" information between tenants. Should system tables be disabled? What changes would be needed to ensure that no per-tenant information "leaks" to another tenant?

Jul 23 '21 21:07 paul-rogers

@vdiravka, I had an opportunity to get a bit more background info for this project. I am going out of my way to try to facilitate this PR; normally we'd require that the PR author provide this information so that us reviews can simply review the code, and not have to reverse engineer requirements and design.

Sounds like the requirements are for a very specific light-weight multi-tenant model: one that allows tenants to set options, create storage plugin configs, and run queries, but not access any other part of Drill. Tenants are to be trusted to not make mistakes. Specifically:

A tenant has a set of "system" options (maybe call them tenant options) available when that tenant creates a Drill session.
A tenant can define a set of storage plugin configs which are visible to only that tenant. Perhaps call these tenant plugin configs.
A tenant can run queries that use the tenant options and tenant plugin configs.

This use case is limited compared to the normal multi-tenant requirements. The following appear to be restrictions for this project:

A tenant does not have access to the Drill Web Console or the Drill REST API and thus does not have access to query profiles.
A tenant does not have access to Zookeeper or the Drill native API. Queries sent by the tenant must go through an intermediate software layer provided by the service provider.
A tenant does not have access to Drill logs to diagnose failed queries.

The above restrictions say that the feature is not useful for open source Drill users who use the Drill-provided UI and APIs. This makes the feature of very limited appeal to the Drill community. So, one of our challenges is to design the feature in a way that users of the "out-of-the-box" Drill can benefit.

Additional restrictions for this one use case:

A tenant cannot start, stop or restart Drillbits, nor can they change startup properties.
A tenant cannot upload a UDF nor can a tenant provide custom connectors (storage plugin classes). (Note that DRILL-7916 is working at cross-purposes to this PR.)
Tenants are trusted to not change system-wide performance-related options (queueing, resource allocation, etc.) The resulting behavior, if those options are changed, is undefined and must be dealt with by the service provider if they occur.
No provision for the Drill admin to view or modify tenant options or plugin configs. If such behavior is desired, a service provider must write tools that work with Drill's persistent storage.
Tenants are trusted to not consume excess resources, so no resource isolation between tenants. Tenant A might try to sort a trillion rows, which might deny resources to other tenants.
Tenants cannot (?) create views or a metadata store.
Parquet metadata caching is either unsupported (?) or must be written to the tenant's S3 bucket; Drill provides no storage for the metadata.

The above limit the solution, but leave the door open to eventually providing more general multi-tenant support.

A final question is the relation between tenant and user. This PR assumes that they are identical: that "fred" is either a normal Drill user in "normal mode", or a tenant in "tenant mode." That is, each tenant has a single Drill user (which works in this use case because of the intermediate software layer.) This explains why this PR is labeled as "instances for different users", the the discussion has revealed the goal to be "instances for different tenants."

Since the "tenant = user" model applies to only this one use case, it again is not a general enough feature to add to the Apache Drill code. Instead, Apache Drill must provide a generally useful solution.

Prior notes have explained how a "per-user" model should work (it requires sharing between users). Specifically a, "config per user" solution must recognize that users work on a team, allow sharing, and permit admin abilities. Similarly, an "options per user" solution must persist only a subset of options (those which are neither system nor per-query options), and must solve the synchronization issue.

Notes have also explained that the customary definition of "tenant" is an organization with multiple users. A true multi-tenant solution must allow multiple users per tenant, and provide a path toward full tenant isolation later.

The challenge here is to find a design that balances the very specific, ad-hoc, unusual needs of this one use case, with something that can evolve to become of general use to the Drill community.

Jul 26 '21 21:07 paul-rogers

@vdiravka, here I will suggest the broad outline of a design which meets the very limited goals of this PR while also providing a path forward for a generally useful set of Apache Drill features.

We introduce one new concept, the tenant. Key concepts:

A tenant is a name space for options and plugin configs. The tenant level is optional; by default, no tenants exist.
Each user has an optional associated tenant ID. A user with no tenant ID works at the system level. (Example: the Drill admin, or a user who sets up demo configs shared by all users.) A user with an associated tenant ID gives the tenant name space to use. There may be 0, 1 or many users associated with a tenant.
A persistent tenant options name space sits between the system options and the session options in the option hierarchy. Users "inherit" options from the system layer, unless overridden at the tenant layer, unless overridden at the session layer.
Queries and plugin config-related API calls resolve plugin configs within the tenant plugin config name space. On lookups, if the config is not found in the tenant name space, repeat the search in the global name space. (Allows system-wide configs for such things as demo data.)

Now we can decide how much of the above to implement in this PR, leaving the rest to future PRs. Given your goals, it would seem the minimum is:

Provide an API or other means to define tenants and to associate tenants with users.
Provide documentation for how to inspect or remove per-tenant name spaces (e.g. if the tenant cancels service or is moved to another cluster.)
Tenant membership is enforced for the native and REST query APIs.
Tenant users can see the Drill Console, but features that are not "tenant-ready" should be disabled. That is, a tenant user might only be able to run a query, and see their options, but not see query profiles, etc.

SQL semantics:

The ALTER SYSTEM command affects the tenant option store given by the user's tenant ID, if any. System (e.g. server-wide) options are set using the same command issued by a user with no tenant ID.
System tables are not available to users associated with a tenant ID. (Later, this can be modified to filter system table contents to include only information for that tenant.)

An possible code design might be:

Add a tenant registry which can be as simple as the known, valid tenant IDs. This "registry" can simply be an API, with a default implementation that does nothing.
Add a field to the user to hold the tenant ID. When resolving a user name, obtain the tenant ID along with the admin/normal-user access level.
Unfortunately, Drill does not have a User object, we assume the only information for a user is the user name. Define a new User object which holds the user ID and tenant ID. (Ideally, it would also hold the security level, etc.)
We know that the system must determine if the user is an admin. Extend that mechanism to also provide the tenant ID. Modify that code to provide a new method which returns the User object. A storage plugin manager client can either use an existing User object, or obtain a new one by calling to this new User factory method.
Introduce the tenant option manager which acts like the system option manager, and persists to a location that includes the tenant ID in the name. Persistence is to a location which includes the tenant ID in the name.
When a session starts, if the user has a tenant ID, add the tenant option manager to the session option manager stack. This one change should ensure that the option solution "just works" everywhere else.
Modify the storage plugin manager to require a User object for every config CRUD method. The User says which tenant store to read from or write to.
The storage plugin registry currently uses keys to locate configs. This key is extended with the tenant ID. Modify config resolution to use the tenant ID. If the config is not found in the tenant store, or the tenant ID is not set for the user, repeat the search with the default "system" config store. When modifying configs, write to the persistent store given by the tenant ID.
Modify all clients of the storage plugin manager to pass along the User object, creating one if needed.
System tables do not have an associated config. For system tables (only), enforce the rule above: if a plugin lookup resolves to the system plugin, and the user has an associated tenant ID, act as if the plugin is not found.

This proposed design accomplishes two goals. First, it is limited in scope to only the limited needs of the PR author. Second, it is constructed in a way that others can extend the solution toward a full multi-tenant solution in the future without breaking backward compatibility.

Note that this design is a multi-tenant solution, not a per-user solution. I would suggest changing the title of the PR to match the intent to (partially) solve the multi-tenant problem.

The good news is that this is more-or-less in line with some of the code in this PR.

@vdiravka, please review these suggestions, and the need to offer a solution that is of general interest to the Drill community. Feel free to use the above, or to modify them, to propose a revised set of requirements and designs. Once we're on the same page at the design level, attention can return to review of the code itself.

Jul 26 '21 22:07 paul-rogers

Converting to draft while there is still discussion about how we approach the design.

Feb 23 '22 12:02 jnturton

drill
drill copied to clipboard