knot-resolver
                                
                                
                                
                                    knot-resolver copied to clipboard
                            
                            
                            
                        High memory consumption
Hi,
I encountered some very strange behavior with knot resolver. For some reason this config causes the kresd process to bloat linearly (~10Mb / hour) and eat hundreds megabytes of memory even without any load:
cache.size = 100 * MB
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')
cache.max_ttl(300)
But when I set max_ttl before opening a cache file the problem disappears and the memory footprint stays at ~17Mb:
cache.size = 100 * MB
cache.max_ttl(300)
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')
Here is the docker file I used:
Dockerfile
FROM debian:11-slim
RUN apt update
RUN apt install -y wget
RUN wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
RUN dpkg -i knot-resolver-release.deb
RUN apt update
RUN apt install -y knot-resolver
COPY config/knot-resolver/kresd.conf /etc/knot-resolver/kresd.conf
ENTRYPOINT ["kresd"]
CMD ["-c", "/etc/knot-resolver/kresd.conf", "-n"]
I would be grateful for any ideas and debug suggestions.
UPD Apparently the lower max_ttl the quicker RAM is consumed. Calling cache.clear() does nothing. Running kres-cache-gc does nothing.
cache.open() resets the TTL limits.
@vcunat could you please elaborate more on how it may cause constant memory growth. 5 mins ttl seems harmless to me.
No, the growth itself does sound like a bug. Reducing TTL will make resolver do more work, etc. but otherwise it's probably just some coincidence that it triggers that bug/growth.
I just wanted to point out that swapping the lines is basically the same as not changing the TTL limit.
Thanks for pointing that out. It was not obvious to me.
I see two plausible options:
- 
the allocator (jemalloc in this case) still does not like the resulting allocation patterns and results into a very sparse heap. (Lots of RAM taken from OS but only small percentage of that actually allocated by kresd.) https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1353#note_265895
 - 
a genuine leak (unreachable memory), but we haven't heard of any significant one so far (in terms of consumed amount of RAM). It will be probably easiest recognizable by setting variable
MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:trueand possibly later inspecting details according to docs. 
I'll definitely investigate your suggestions. Thanks for sharing. 🙇♂️
@vcunat But I am still puzzled by the fact that using such a simple setting as max_ttl causes this problem and it was not noticed before... Can you advise what else I can check to discard the possibility of a simple error in my configuration. As I mentioned in the UPD section I tried clearing the cache with cache.clear() and running kres-cache-gc with no affect on the memory footprint.
Cache size is unrelated; that's always exactly 100 MiB file, mapped to memory (according to your config).
I mean, the cache file will be part of the RAM usage that you see, but it has that hard upper limit.