chproxy
chproxy copied to clipboard
Amount of keys in redis cache is constantly growing
chproxy v1.16.3
There are redis cache configured as following:
- name: cache-name
mode: redis
redis:
addresses:
- hostname:6379
expire: 300s
There are following keys appearing:
799) "26243a2fa2435c33269f75da4e359f3e-transaction"
800) "62fcbcea7329cd2615aa4b5a792b0420-transaction"
801) "063cdf4e211ee0441f15323503515d57-transaction"
802) "802fa878e67d0516bcd41ddf3521e826-transaction"
803) "be3ca83940942e976e55f9c04f8ea44c-transaction"
804) "c8c7144ba9a34b53a6c6069facc3b3d5-transaction"
805) "1b604171cf60b2320b477e440e39cfcc-transaction"
806) "4406a4e641f31e7b91b3e24e59d9c6ef-transaction"
807) "ab841321dadb6921814bad6113b9b0bb-transaction"
808) "9205c903c51f8fcb1a9e9e9fa0a08959-transaction"
...
redis> TTL 9205c903c51f8fcb1a9e9e9fa0a08959-transaction
(integer) -1
redis> get 9205c903c51f8fcb1a9e9e9fa0a08959-transaction
"2"
redis> DEBUG OBJECT 9205c903c51f8fcb1a9e9e9fa0a08959-transaction
Value at:0x7fb41d22cd20 refcount:1 encoding:int serializedlength:2 lru:15865814 lru_seconds_idle:6
Somehow those keys has no TTL set, and it's amount is constantly growing, even if the actual cached result has been removed by the TTL

Despite the small size of those keys, with a large amount of transactions those keys may consume all the memory of the redis instance, and then pushed out by the LRU, when memory will be required. In this case, consumed memory will stay ~constant near the configured maximum for the instance.
Is it expected behavior?
It looks suspicious. It might come from the given change done 1 month ago that was done in order to be backward compatible with older version of redis servers https://github.com/ContentSquare/chproxy/pull/184 Could you do the same test on v1.16.0 (that doesn't contain this change) and give more details about the redis you're using (clustered, which version)?
@mga-chka
Could you do the same test on v1.16.0 (that doesn't contain this change)
Tested, results seem to be the same
# ./chproxy --config chproxy-conf.yaml
INFO: 2022/08/10 00:36:52 main.go:46: chproxy ver. 1.16.0, rev. c4aece8f482043ca7662087db864990711f1d9b0, built at 2022-06-24T11:46:12Z
INFO: 2022/08/10 00:36:52 main.go:47: Loading config: chproxy-conf.yaml
INFO: 2022/08/10 00:36:52 main.go:280: Loaded config:
server:
http:
listen_addr: :88
...
clusters:
- name: local
scheme: http
nodes:
- 127.0.0.1:8123
users:
- name: default
password: XXX
...
users:
- name: test
password: XXX
to_cluster: local
to_user: default
...
cache: test-redis
log_debug: true
hack_me_please: true
caches:
- mode: redis
name: test-redis
expire: 5m
redis:
addresses:
- redis-cache-host:6379
INFO: 2022/08/10 00:36:52 main.go:55: Loading config "chproxy-conf.yaml": successful
INFO: 2022/08/10 00:36:52 main.go:160: Serving http on ":88"
After making some test queries picture is following:
redis-cache-host:6379> keys *
1) "2aa5102467ea68cb127356062a039731-transaction"
2) "039296291c7039831ebb50bb51f2789e-transaction"
3) "ceb4cf07c3edc2e325200c494f971a77-transaction"
4) "d0b49760f8ea57fda09c10850b379bd3-transaction"
5) "ece9ae938180cd203dc92ff2d0035a60-transaction"
redis-cache-host:6379> TTL ece9ae938180cd203dc92ff2d0035a60-transaction
(integer) -1
redis-cache-host:6379> DEBUG OBJECT ece9ae938180cd203dc92ff2d0035a60-transaction
Value at:0x7f7727c03158 refcount:1 encoding:int serializedlength:2 lru:15949883 lru_seconds_idle:1039
redis-cache-host:6379> INFO
# Server
redis_version:6.0.16
redis_git_sha1:63c47a09
redis_git_dirty:1
redis_build_id:f20f71e10920f6e0
redis_mode:standalone
os:Linux 4.19.0-8-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.5.0
...
# Cluster
cluster_enabled:0
There is something weird. I just checked a few records fromour redis in production and the TTL from CHProxy is working well
[adresse of redis]:[port YYY]> TTL 127efbf543a03465d854a60b4fb557ed-transaction (integer) 15 [adresse of redis]:[port YYY]> TTL 127efbf543a03465d854a60b4fb557ed-transaction (integer) 9 [adresse of redis]:[port YYY]> TTL 123c1c91fe87cab63b22d083893106ae-transaction (integer) 12 [adresse of redis]:[port YYY]> TTL 1244629318897d9b7521ae88e79839e9-transaction (integer) 17
The main difference is that we're using redis version:5.0.6 (and chproxy 1.16.2). We will try to do some tests on redis 6 when we have time to see if the bug is linked to the version of redis. But if you have the possibility, could you try with a redis 5 ?
After looking at the code, the issue might come from your configuration of chproxy.
chproxy stores 2 kinds of objects in Redis:
- the transactions that are used to handle concurrent transactions: it prevents chproxy to run twice the same SQL queries on CH when they are not cached yet and used at the same time. They are very small objects whose keys are suffixed by "-transaction". It's the objects whose TTL doesn't work in your case. The TTL is automatically set to the value 2 * max_execution_time (2 minutes per default) or grace_time (which is supposed to be deprecated and you shouldn't use it). If your graceTime is negative or if your maxExecutionTime is set at 0 you could have some TTL issues.
- the cached queries (potentially large objects) whose TTL is handled by the "expire" conf. You don't seem to have issues on this one given the result of your "keys *" in redis
So did you modified the max_execution_time or grace_time in your configuration file? If yes, what are they?
@mga-chka
it prevents chproxy to run twice the same SQL queries on CH when they are not cached yet and used at the same time.
Okay, you're taking about thundering herd, explained in the doc Logic here is clear.
So did you modified the max_execution_time or grace_time in your configuration file? If yes, what are they?
I will show you the exact config i used to reproduce this problem:
# cat chproxy-conf.yaml
---
log_debug: true
hack_me_please: true
caches:
- name: test-redis
mode: redis
redis:
addresses:
- redis-cache-host:6379
expire: 300s
users:
- name: testuser
password: password
to_cluster: local
to_user: default
cache: test-redis
clusters:
- name: local
scheme: http
nodes:
- 127.0.0.1:8123
users:
- name: default
heartbeat_interval: 5s
server:
http:
listen_addr: :88
...
Reported config when chproxy is starting, some defaults has been applied:
# ./chproxy --config chproxy-conf.yaml
INFO: 2022/08/10 04:07:20 main.go:46: chproxy ver. 1.16.0, rev. c4aece8f482043ca7662087db864990711f1d9b0, built at 2022-06-24T11:46:12Z
INFO: 2022/08/10 04:07:20 main.go:47: Loading config: chproxy-conf.yaml
INFO: 2022/08/10 04:07:20 main.go:280: Loaded config:
server:
http:
listen_addr: :88
forceautocerthandler: false
read_timeout: 1m
write_timeout: 90s
idle_timeout: 10m
clusters:
- name: local
scheme: http
nodes:
- 127.0.0.1:8123
users:
- name: default
password: XXX
heartbeat_interval: 5s
heartbeat:
interval: 5s
timeout: 3s
request: /?query=SELECT%201
response: |
1
users:
- name: testuser
password: XXX
to_cluster: local
to_user: default
max_execution_time: 30s
cache: test-redis
log_debug: true
hack_me_please: true
caches:
- mode: redis
name: test-redis
expire: 5m
redis:
addresses:
- redis-cache-host:6379
INFO: 2022/08/10 04:07:20 main.go:55: Loading config "chproxy-conf.yaml": successful
INFO: 2022/08/10 04:07:20 main.go:160: Serving http on ":88"
...
after connecting with clickhouse-cli and getting the promt i have following keys in the redis:
redis-cache-host:6379> keys *
1) "d0b49760f8ea57fda09c10850b379bd3-transaction"
2) "039296291c7039831ebb50bb51f2789e"
3) "d0b49760f8ea57fda09c10850b379bd3"
4) "039296291c7039831ebb50bb51f2789e-transaction"
redis-cache-host:6379> TTL d0b49760f8ea57fda09c10850b379bd3-transaction
(integer) -1
redis-cache-host:6379> TTL 039296291c7039831ebb50bb51f2789e-transaction
(integer) -1
redis-cache-host:6379>
as you can see, max_execution_time is set to default value 30s, but infite TTL is still set for redis object. grace_time is not used since it's deprecated.
Actually, i was going to make a separated issue about default value for the max_execution_time setting.
There are 2 ways to control the maximum amount of time for the specific user in chproxy:
-
by setting
max_execution_timein user config section of chproxy: https://github.com/ContentSquare/chproxy/tree/master/config#user_config By the way, doc is not correct since default value is not0now. -
In the params group, by the
max_execution_timetunable for the ClickHouse itself (not chproxy).
As you can see, there are 2 ways to configure the same parameter with the same variable (same name), and both of them may be applied at the same time! This is very confusing, query execution time limit should be configured only in one place. Before we were using v1.13.2, and unset max_execution_time value was just disabling the limit on the Chproxy side, allowing to set the limit on the ClickHouse side, and it was very usable because we was able to use param groups to set this limit.
With a recent versions unset max_execution_time means 30s, which is quite weird because ClickHouse it's OLAP DB, where queries may take hours to finish. And it's necesseary to set explicitly for every user now, which is very annoying.
And it's not possible simply disable the max_execution_time on the chproxy side:
- unset
max_execution_timevariable or0leads to default30s, which is LOW - negative value is not valid
- positive values require matching chproxy- and clickhouse-side timeouts which is annoying
For now i just set 86400s as max_execution_time on the chproxy side, which is guaranteed above all the clickhouse-side limits, to make sure chproxy-side limit will never come into play. Ugly hack :(
Thanks, we'll try to reproduce the issue on our side and fix it. Since the cached query works on your side (since they disappear from Redis cf the listing of keys in your first post where all the keys end will a "-transaction") , it can't be linked to a redis version issue or something related to your infrastructure.
Regarding the max_execution_time, you're not the first one complaining about this new default value, this is why the default value is now at 2 minutes in the last version of chproxy. We made this change because of the way the concurrent query handling worked but I think we will find another way so that the max_execution_time can be back to unlimited.
FYI, if you set 86400s as max_execution_time, the "-transactions" will stay 172800 sec/48 hours in redis (once we find the current issue with the TTL).
FYI I was able to reproduce the bug. There is an ongoing PR regarding the way we fetch the max_execution_time in the config because it's a bit broken and will not work for many users of chproxy. https://github.com/ContentSquare/chproxy/pull/190. It should be merged soon. Until the next release of chproxy, you can bypass the pb by setting the grace_time in the conf for example here is a working conf similar to the one you tried
log_debug: true hack_me_please: true caches:
- name: test-redis mode: redis redis: addresses: - 127.0.0.1:6379 expire: 300s grace_time: 300s users:
- name: user password: pwd to_cluster: local to_user: default cache: test-redis max_concurrent_queries: 2 max_execution_time: 30s clusters:
- name: local
scheme: http
nodes:
- 127.0.0.1:8123 users:
- name: default heartbeat_interval: 5s server: http: listen_addr: ":9090" allowed_networks: ["127.0.0.0/24"]
@mga-chka
We made this change because of the way the concurrent query handling worked but I think we will find another way so that the max_execution_time can be back to unlimited.
Thanks, that would be nice. Actually, i was pretty much happy with per-cache grace_time setting, it was easier to configure and logic behind it was clear.
FYI, if you set 86400s as max_execution_time, the "-transactions" will stay 172800 sec/48 hours in redis (once we find the current issue with the TTL).
That's true, but i don't see so far how it will affect the behavior. For now, TTL is even set to -1 (unlimited) for the the "-transactions", but caching logic still working fine as i tested. Maybe i'm just missing something?
Until the next release of chproxy, you can bypass the pb by setting the grace_time in the conf for example here is a working conf similar to the one you tried
Actually, it seem working fine so far with high max_execution_time value as i mentioned above, except the fact that amount of keys are growing, which is not really noticeable in the real-world applications. Am i missing any other flaw of this approach?
no your not missing something. Putting the grace_time would just avoid the garbage in Redis from the "-transaction" items. If your redis is configured to automatically remove objects and you don't mind the pollution from those small objects then you can keep it as it is.
Frankly say, it's not a big issue so far. I hope https://github.com/ContentSquare/chproxy/pull/190 will be merged prior my 1GiB redis instance will be completely hogged by 1-2 bytes records :) But i'll keep your suggestion in mind, thank you!
since the https://github.com/ContentSquare/chproxy/pull/190 was merged and a new version was released, I'm closing this issue. If you still have issues, feel free to re-open this PR.
Thanks guys, i've just tried v1.16.4 and TTL for *-transaction keys set correctly in redis: 2xmax_execution_time