knot-resolver High memory consumption

Hi,

I encountered some very strange behavior with knot resolver. For some reason this config causes the kresd process to bloat linearly (~10Mb / hour) and eat hundreds megabytes of memory even without any load:

cache.size = 100 * MB
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')
cache.max_ttl(300)

But when I set max_ttl before opening a cache file the problem disappears and the memory footprint stays at ~17Mb:

cache.size = 100 * MB
cache.max_ttl(300)
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')

Here is the docker file I used:

Dockerfile

FROM debian:11-slim

RUN apt update
RUN apt install -y wget

RUN wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
RUN dpkg -i knot-resolver-release.deb
RUN apt update
RUN apt install -y knot-resolver

COPY config/knot-resolver/kresd.conf /etc/knot-resolver/kresd.conf

ENTRYPOINT ["kresd"]
CMD ["-c", "/etc/knot-resolver/kresd.conf", "-n"]

I would be grateful for any ideas and debug suggestions.

UPD Apparently the lower max_ttl the quicker RAM is consumed. Calling cache.clear() does nothing. Running kres-cache-gc does nothing.

Jun 19 '23 20:06 i7an

cache.open() resets the TTL limits.

Jun 19 '23 20:06 vcunat

@vcunat could you please elaborate more on how it may cause constant memory growth. 5 mins ttl seems harmless to me.

Jun 20 '23 07:06 i7an

No, the growth itself does sound like a bug. Reducing TTL will make resolver do more work, etc. but otherwise it's probably just some coincidence that it triggers that bug/growth.

I just wanted to point out that swapping the lines is basically the same as not changing the TTL limit.

Jun 20 '23 08:06 vcunat

Thanks for pointing that out. It was not obvious to me.

Jun 20 '23 08:06 i7an

I see two plausible options:

the allocator (jemalloc in this case) still does not like the resulting allocation patterns and results into a very sparse heap. (Lots of RAM taken from OS but only small percentage of that actually allocated by kresd.) https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1353#note_265895
a genuine leak (unreachable memory), but we haven't heard of any significant one so far (in terms of consumed amount of RAM). It will be probably easiest recognizable by setting variable MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true and possibly later inspecting details according to docs.

Jun 21 '23 12:06 vcunat

I'll definitely investigate your suggestions. Thanks for sharing. 🙇‍♂️

@vcunat But I am still puzzled by the fact that using such a simple setting as max_ttl causes this problem and it was not noticed before... Can you advise what else I can check to discard the possibility of a simple error in my configuration. As I mentioned in the UPD section I tried clearing the cache with cache.clear() and running kres-cache-gc with no affect on the memory footprint.

Jun 21 '23 13:06 i7an

Cache size is unrelated; that's always exactly 100 MiB file, mapped to memory (according to your config).

Jun 21 '23 14:06 vcunat

I mean, the cache file will be part of the RAM usage that you see, but it has that hard upper limit.

Jun 21 '23 14:06 vcunat

knot-resolver knot-resolver copied to clipboard

High memory consumption

knot-resolver
knot-resolver copied to clipboard