rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Key Spaces RFC

Open gregwebs opened this issue 5 years ago • 21 comments

gregwebs avatar Jan 31 '20 01:01 gregwebs

Key spaces sound like a solution to the problem tikv/tikv#3922. And it may also be useful if we want to implement multi-tenancy in cloud.

BusyJay avatar Feb 05 '20 12:02 BusyJay

Key spaces sound like a solution to the problem tikv/tikv#3922.

A key space would be designated as transactional or non-transactional when it is created?

gregwebs avatar Feb 05 '20 15:02 gregwebs

Key spaces sound like a solution to the problem tikv/tikv#3922. And it may also be useful if we want to implement multi-tenancy in the cloud.

I don't think it is a good solution to issue tikv/tikv#3922. TiKV supports multi-tenancy in the cloud is a big topic, there are many things need to consider, but this RFC is a good beginning.

zhangjinpeng87 avatar Feb 06 '20 00:02 zhangjinpeng87

Key spaces sound like a solution to the problem tikv/tikv#3922. And it may also be useful if we want to implement multi-tenancy in the cloud.

I don't think it is a good solution to issue tikv/tikv#3922. TiKV supports multi-tenancy in the cloud is a big topic, there are many things need to consider, but this RFC is a good beginning.

I'm very glad to discuss key space under the assumption that we(TiKV/TiDB) are going to support multi-tenancy in cloud. I know we can use key space to do many many other things, but multi-tenancy in cloud is the most attractive one for me(maybe wrong). If this assumption is true, the most important thing I think is the resource isolation, especially for those users that don't have too much data(but their QPS is not neglectable), we should limit these users' CPU/IO/network usage individually, and the memory cache also has a quote.

zhangjinpeng87 avatar Feb 06 '20 01:02 zhangjinpeng87

the most important thing I think is the resource isolation

Yes. Key spaces registered for applications provide a crucial piece of meta information that can be used to help with resource isolation. Identifying the application is required for this. We could solve it in other ways, but here again I think Key Spaces provide the best possible building block. But I think this topic requires an entirely separate RFC, I don't think we will be able to productively discuss it here.

gregwebs avatar Feb 06 '20 01:02 gregwebs

the most important thing I think is the resource isolation

Yes. Key spaces registered for applications provide a crucial piece of meta information that can be used to help with resource isolation. Identifying the application is required for this. We could solve it in other ways, but here again I think Key Spaces provide the best possible building block. But I think this topic requires an entirely separate RFC, I don't think we will be able to productively discuss it here.

Got it. @sunxiaoguang If TiKV/PD supports key space, I would say it almost supports the table concept in TiKV. I think this is helpful for your TableStore base on TiKV, would you mind to take a look at this RFC? Thank you in advance.

zhangjinpeng87 avatar Feb 06 '20 02:02 zhangjinpeng87

I added an auth proposal. This builds on top of key spaces and hopefully helps to motivate them a bit more, particularly in demonstrating why they can't be implemented client-side.

@zhangjinpeng1987 and others asked about this.

gregwebs avatar Feb 06 '20 06:02 gregwebs

Just come across and find out that someone finally proposes keyspaces. Please map each table / index of TiDB into a different keyspace, and allow different keyspaces to use different settings (like the number of replicas, storage options, etc), it will make a lot of things easier.

huachaohuang avatar Mar 01 '20 16:03 huachaohuang

@huachaohuang we already have some ability to have different settings for different tables. So I am not sure if keyspaces are needed for that. Currently I am suggesting that TiDB still manage its table spaces in a single key space (I would call these client-side namespaces) and that key spaces should define the boundaries between applications. But maybe if namespaces were additionally registered in TiKV (still as a separate concept from key spaces) it could help formalize the table concept in TiKV.

gregwebs avatar Mar 02 '20 18:03 gregwebs

@gregwebs Mapping a table into a keyspace has some benefits. First of all, we can remove the t{table_id}_ prefix for record keys and t{table_id}_i{index_id}_ prefix for index keys. Once we remove these prefixes from keys, we can move tables between clusters. For example, you can merge two small clusters into one without worrying about table id conflicts, because you can simply allocate another keyspace. Otherwise, you need to rewrite all the data just because the table id is changed.

huachaohuang avatar Mar 03 '20 09:03 huachaohuang

I see, you want prefix abstraction. The keyspace abstraction in this proposal guarantees no conflicts when importing data by altering the prefix during import, but cannot guarantee no conflicts when live merging. With key spaces in this proposal there is still a concrete prefix that it is chosen at insertion time and used in TiKV storage which is still a single KV monolith. This prefix is still stored in RocksDB.

What you are looking for seems to be a break-up of the TiKV monolith. I can imagine this: data regions have an identity in addition to a prefix and the identity is mapped to a key space and can can be altered. However, this is a fundamental change to how TiKV operates that is beyond the scope of this proposal.

To achieve live-merging with this proposal the best approach is probably to more explicitly manage a global key space. An organization could maintain a global key space registry such that different clusters will have key spaces with non-conflicting prefixes (seed each cluster with a different prefix).

gregwebs avatar Mar 03 '20 18:03 gregwebs

@gregwebs

I have another keyspace design for multi-tenant at https://github.com/ngaut/unistore/pull/393

The key difference is the cluster id is used as tenant ID, and the prefix for tenant id is fixed.

coocood avatar Sep 02 '20 06:09 coocood

I think the cluster id could potentially be useful to associate with the default namespace. However, if we attempt to use the cluster id for this we will change the meaning of the cluster id and greatly complicate it so we are better off not attempting to re-use the concept here.

But overall the proposals seem to be the same.

gregwebs avatar Sep 02 '20 14:09 gregwebs

I am considering a design where TiKV does not have any knowledge of keyspaces. Keyspace is an organizational concept that exists only in PD where clients can register their keyspace prefixes. The TiKV client is responsible for ensuring the key prefix is used.

When auth is implemented then TiKV will enforce access restrictions that are present in the signed token. These access restrictions would include restricting the client to the usage of a particular keyspace prefix.

gregwebs avatar Sep 02 '20 16:09 gregwebs

This RFC is changed now so that TiKV has no knowledge of key spaces. I created a new RFC for TiKV to assist with sending prefixes that can be used for key spaces: https://github.com/tikv/rfcs/pull/56

gregwebs avatar Sep 03 '20 23:09 gregwebs

The RFC is updated for changes to the key prefixes specification and clarity. Clients can now specify the prefix they want to register: this helps with backwards compatibility.

gregwebs avatar Sep 24 '20 18:09 gregwebs

I added more backwards compatibility details.

gregwebs avatar Nov 26 '20 16:11 gregwebs

@gregwebs Any idea on how this could work with merge operators / column family ? https://docs.rs/rocksdb/0.15.0/rocksdb/merge_operator/index.html

subu-cliqz avatar Dec 06 '20 17:12 subu-cliqz

@gregwebs Any idea on how this could work with merge operators / column family ? https://docs.rs/rocksdb/0.15.0/rocksdb/merge_operator/index.html

Do you have any specific concerns?

gregwebs avatar Dec 07 '20 01:12 gregwebs

@gregwebs Any idea on how this could work with merge operators / column family ? https://docs.rs/rocksdb/0.15.0/rocksdb/merge_operator/index.html

Do you have any specific concerns?

Yes. From a brief reading of the RFC, it looks like the idea is to keep all of the keyspace maintenance logic in the PD.

My concern is that the merge operator would have to defined for each column family. At first glance, this looks hard to do without making the application / column family relationship more explicit in TiKV ?

I realize that the merge functionality is not supported in TiKV as of now, but I plan on working on that.

Related question : The new column families could be configured individually for compression / block size etc. Would these be made possible through the PD as well?

Thanks for answering my questions :)

subu-cliqz avatar Dec 07 '20 03:12 subu-cliqz

@subu-cliqz I am glad you are helping TiKV. This proposal does not add new column families so I don't think it would change your work. One piece of feedback was to equate a column family to a keyspace but I do not want to have that in the RFC.

gregwebs avatar Dec 07 '20 04:12 gregwebs