Tracking issue for Cache-service
Cache service will be a cluster of data block cache servers sitting in mid of databend-query and underlying storage service to provide fast data block retrieving.
- [x] https://github.com/datafuselabs/databend/pull/6799
- [x] Architecture design: https://github.com/datafuselabs/opencache
- [ ] Freeze architecture design when the RFC task is closed.
Server-side tasks
The following are still some tracking-issue level tasks:(
- [ ] Impl cache replacement policy: modified LIRS
- [ ] Impl
Chunkstore. See arch - [ ] Impl
Objectstore. See arch - [ ] Impl
Accessstore. See arch - [ ] Impl
Manifeststore. See arch - [ ] Impl service protocol http2 PUT+GET
- [ ] Impl configuration and server initialization.
- [ ] Impl server restart procedure.
Databend side tasks could be found at: https://github.com/datafuselabs/databend/issues/6803
beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.
beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.
AFAIK, the latest decision about the temp data generated by a join is to store them in a persistent store such as s3. Since the cache server can not provide a durability guarantee, the temp data has a chance to be evicted when a query execution still needs it.
To support such a requirement, the cache server has to be able to let an application explicitly disable auto eviction for some data. And the application has to purge these caches explicitly when the join job is done.
Maybe it can be a future feature that objects have durability configuration?
I think Client side tasks should be split into two parts:
- OpenDAL's
opencacheservice implementsAccessorbased on theopencacheAPI. - Databend's
Temporary Operatordefines cluster metadata format.
databend-meta support opencache cluster metadata.
Should databend have special knowledge for caching services? Maybe they should be handled inside opendal client?
beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.
Based on the current design, this feature will be resolved by Temporary Services. The query can store temporary data:

OpenCache can be implemented as a temporary services too ~
I think
Client side tasksshould be split into two parts:
- OpenDAL's
opencacheservice implementsAccessorbased on theopencacheAPI.- Databend's
Temporary Operatordefines cluster metadata format.
Do you mean that Accessor only connects to one opencache server and lets Operator do the load balancing job, such as with a consistent hash?
It would be nice if I understood you correctly.
databend-meta support opencache cluster metadata.
Should databend have special knowledge for caching services? Maybe they should be handled inside opendal client?
I agree:D
Do you mean that
Accessoronly connects to oneopencacheserver and letsOperatordo the load balancing job, such as with a consistent hash?
OpenDAL's opencache client can handle the load balancing job and accept some options:
[cache]
type = "opencache"
[cache.opencache]
endpoints = ["192.168.0.2", "192.168.0.3"]
hash_methods = "ConsistentHash"
OpenDAL's operator is just an
Arcwrapper forAccessor, we should wrap the logic inside.
adding a namespace-like configuration may make operations on a multi-tenant deployment easier:
- cache keys can be isolated in a different namespaces
- we can collect some info or restrict usage at the namespace level, like qps, memory limit, rate limit, etc.
This tracking issue is composing two things:
- Cache support in Databend
- Distributed Cache services: OpenCache
I split them into https://github.com/datafuselabs/databend/issues/6803
adding a namespace-like configuration may make operations on a multi-tenant deployment easier:
I created a feature request for opencache: https://github.com/datafuselabs/opencache/issues/3
adding a namespace-like configuration may make operations on a multi-tenant deployment easier:
- cache keys can be isolated in a different namespaces
- we can collect some info or restrict usage at the namespace level, like qps, memory limit, rate limit, etc.
I'd prefer not to introduce namespace into the server side:
-
Splitting the cache server disk space into several sub-space reduces resource efficiency. The reserved space in one sub-space can not be used by another heavily loaded sub-space.
-
every time a node is added or removed, or a tenant(sub-space) is added or removed, the quota of every sub-space needs to be adjusted.
Maybe it's better to be done on the client side.
Splitting the cache server disk space into several sub-space reduces resource efficienc
oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets
the operational functionality like quota do not need to be built in the 1st phase, but a design with namespace built-in would make the life easier when multi-tenant workload get into place.
Splitting the cache server disk space into several sub-space reduces resource efficienc
oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets
It's different from a db such as bolt: the access load pattern affects the internal data layout. A tenant who reads a lot from the cache server will aggressively evict every piece of other user's data.
It's not difficult to have a namespace inside the cache server, but it can not be used for throttling without physical isolation.
Splitting the cache server disk space into several sub-space reduces resource efficienc
oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets
It's different from a db such as bolt: the access load pattern affects the internal data layout. A tenant who reads a lot from the cache server will aggressively evict every piece of other user's data.
It's not difficult to have a namespace inside the cache server, but it can not be used for throttling without physical isolation.
do our cache service planned to have a proxy layer like twemproxy ?
if it is, I guess we can do something in the proxy layer to take some control. it'd be easy to rate limit or restrict memory size in the proxy layer.
if not, it's hard to maintain the namespace concept in server side IMHO, there'd be no such a central "server side" but many peers in a distributed manner q.q
IMHO a proxy layer do have its complexities (especially in HA) and need not to be a top priority work.
another way we may consider is a sidecar proxy to coordinate the multi tenant usages, which works more likes a client, it encapsulates the complex parts like distributed coordination & billing logic, just offering a simple GET/SET rpc interface.
maybe likes this: https://github.com/facebook/mcrouter , I remember it's deployed on each host machine to forward mc requests to a big mc pool, and have some additional features like failover, local cache, stats, etc.
@zhihanz @hantmac what's your opinion about this? q.q
do our cache service planned to have a proxy layer like twemproxy ?
Hmm... no such plan yet. Meanwhile the proxy job is done by the opendal.
I agree that namespace, proxy, quota, and audit are all useful features. But considering our cache services only have markdown docs now, I prefer to start as simple as possible.
Let's implement the cache service and databend's cache mechanism first. After that, we can have a more profound and solid code base for improvement.
What do you think? @flaneur2020 @drmingdrmer
Not used so far. Feel free to re-open it when needed.