lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

Reduce memory usage of rate limiting

Open dullbananas opened this issue 2 years ago • 6 comments

RateLimitStorage grows whenever a new IP address makes a request, so it's extremely important to make it use as little memory as possible.

The biggest problem is that old buckets are not deleted, so the memory consumption just keeps on growing. This could cause the server to run out of memory. To fix this, I added a weekly task that causes an IP address's buckets to be deleted after 1 to 2 weeks of inactivity (or more if any rate limit interval is longer than 1 week).

And here's each step in my process of optimizing the buckets field of RateLimitStorage:

HashMap<RateLimitType, HashMap<IpAddr, RateLimitBucket>> (original)

  • IpAddr is a wrapper for String. RateLimitBucket values for all RateLimitTypes are initialized when an IP address is added, so there's always one IpAddr stored for each RateLimitType. On a 64 bit machine, they use a total of at least 186 bytes per IP address (each String stores 24 bytes inline, and has at least the amount of characters in "0.0.0.0").

HashMap<IpAddr, HashMap<RateLimitType, RateLimitBucket>>

  • Now, for each IP address, only one IpAddr is stored instead of six.

HashMap<IpAddr, EnumMap<RateLimitType, RateLimitBucket>>

  • EnumMap only stores an inline fixed-sized array: [RateLimitBucket; 6]. It doesn't store things like length, pointers, or even keys. This is the most compact it can be.

HashMap<Ipv6Addr, EnumMap<RateLimitType, RateLimitBucket>>

  • Ipv6Addr is only 16 bytes. This uses less memory than an empty String.

I also made RateLimitBucket 3 times smaller, which saves 96 bytes per IP address.

dullbananas avatar Jun 14 '23 19:06 dullbananas

I also made get_ip work correctly with IPv6 addresses. The previous implementation would only return the first segment of an IPv6 address (before the first colon).

Now ready to merge

dullbananas avatar Jun 15 '23 05:06 dullbananas

This is only kinda related but I wanted to mention that rate-limiting by full ipv6 addresses is pretty useless (except for the most honest of users) because you get assigned a full /64 (2**64 addresses) or even a full /56 (2**(128-56) addresses) and can switch between them without any effort (with privacy extensions it even switches automatically every hour or so).

So for ipv6 rate limiting to be mostly equivalent to the ipv4-rate limiting you need to store the /64 subnet and not the full ip, and for it to be really effective you need to use a cascading rate limit. e.g. allowing each /64 1x some limit, each /56 5x some limit, and each /48 10x some limit.

phiresky avatar Jun 16 '23 16:06 phiresky

Oh noooo. I should probably try to prevent that IPv6 rate limiting problem in this pull request. If it gets merged without a fix, then it could be disasterous because it removes the accidental /16 subnet rate limiting that was caused by the bug in get_ip.

dullbananas avatar Jun 17 '23 05:06 dullbananas

It now limits /64 with 1x capacity, /56 with 4x capacity, and /48 with 16x capacity.

dullbananas avatar Jun 18 '23 19:06 dullbananas

Needs a cargo +nightly fmt

dessalines avatar Jun 19 '23 14:06 dessalines

I fixed the formatting and the CI still shows the same formatting error

dullbananas avatar Jun 19 '23 14:06 dullbananas

Good job, thank you!

Nutomic avatar Jun 21 '23 08:06 Nutomic