SyliusResourceBundle add pool customization options

add pool customization options

Open Yneth opened this issue 2 years ago • 11 comments

It might not be the best way to add such customisation, as it clashes with other fields (total-connections, target-utilization, max-queue-size, stats-callback) but it should work.

The rationale of this change is to add possibility to pick your own pool implementation. In my case dirigiste pool does not work due to memory leak in

Jun 13 '22 06:06 Yneth

Thanks for submitting this. I think I would prefer to fix Dirigiste, if possible. PRs can be submitted at https://github.com/clj-commons/dirigiste

At a first glance, why isn't this being garbage-collected? Are you collecting stats yourself somehow? 35k seems like an awful lot stats objects.

Jun 15 '22 08:06 KingMob

@KingMob for some reason stats are not disposed. My fix was to implement Pool without stats at all. Unfortunately I do not have much time but I'll try to check what could be the reason.

Jun 15 '22 08:06 Yneth

@Yneth Can you share a minimal test case reproducing it?

Jun 15 '22 09:06 KingMob

Are you sure the 35'769 arrays of long are all referencing dirigiste? I don't ask you to check for all obviously, but can you randomly pin point some places to ensure it's really the case.

Jun 20 '22 20:06 arnaudgeiser

~~I'm asking this because a reservoir can contains at most 2048 elements (the entry you showed), so :~~

~~35769×16408÷1024 = 573 MiB~~

~~and the value is twice as bigger.~~

Jun 20 '22 20:06 arnaudgeiser

@Yneth Even if you don't have a test case, did you save the profiling session? It would be nice to take a look at it.

Jun 21 '22 07:06 KingMob

@KingMob no, but I am sure that it gives problems. My fix was to create custom pool implementation without metrics, those that are using Reservoir.

It is a bit hard to create minimal test as I need to do some investigation beforehand and I am short on time.

My case looks the following way: 5 pools one per proxy and 1 additional without any proxy config, running http requests all day. it takes around 6 hours when problems arise.

Also I see uncatched error warning from manifold even though I am using m/catch everywhere, just saying in case it could also be a root cause.

Jun 21 '22 07:06 Yneth

@KingMob I guess giving a possibility to specify the whole pool would be nice, but it would require making (create-connection function public. Not sure if it is ok to expose this so decided to go with the simplest draft solution

Jun 21 '22 08:06 Yneth

@KingMob I guess giving a possibility to specify the whole pool would be nice, but it would require making (create-connection function public. Not sure if it is ok to expose this so decided to go with the simplest draft solution

I'm generally OK with making it public, but leaving it out of the generated documentation via ^:no-doc if needed. Most people won't need to replace the pool.

Also, if you can generate a memory profile, we'd still really appreciate it. It's harder to debug this without one.

Jun 21 '22 08:06 KingMob

@Yneth Any updates? After doing some work with Dirigiste recently, I want your PR. 😄

Jul 08 '22 06:07 KingMob

will send an update later today

Jul 08 '22 07:07 Yneth

Any update from your side?

What we actually need from you to determine if the issue is related to io.aleph.dirigiste.Stats$** is the number of those instances when your memory starts growing.

I wasn't able to reproduce on my side but the expected pattern is this:

I have three io.aleph.dirigiste.Executor (4 reservoirs each) + one io.aleph.dirigiste.Pool(4 reservoirs). (16 reservoirs). Those are actually capped to 32'892 bytes (4096 * 8 bytes = 32'768 + a few other references)

How many of those UniformDoubleReservoir do you have?
How many different io.aleph.dirigiste.Executor they reference?
How many different io.aleph.dirigiste.Pool they reference?

From the printscreen you provided, we can only determine a lot of long have been allocated on the heap, but not what is the real issue.

Aug 15 '22 15:08 arnaudgeiser

Ok.. ok... I'm able to create a pattern where it can be problematic. With the following code, I'm able to create tons of Reservoir :

(require '[aleph.http :as http])
(require '[manifold.deferred :as d])
(require '[clj-commons.byte-streams :as bs])

(comment
  (http/start-server (constantly {:body "hello!"}) {:port 8081})


  (defn get [nb]
    (apply d/zip (mapv (fn [_] (d/chain (http/get (format "http://%s.localhost:8081/" (java.util.UUID/randomUUID)))
                                        :body
                                        bs/to-string)) (range nb))))

  (d/loop [d (get 10)]
    (d/chain' d
              (fn [_]
                (prn "Done!")
                (Thread/sleep 300)
                (d/recur (get 10000))))))

Each different domain creates me a new Reservoir.

They are supposed to be cleaned up by the following code

// clear out any unused queues
_lock.lock();
for (Map.Entry<K,Stats> entry : _stats.entrySet()) {
    K key = entry.getKey();
    if (entry.getValue().getUtilization(1) == 0
        && _queues.get(key).objects.get() == 0) {
        _queues.remove(key).shutdown();

        // clean up stats so they don't remain in memory forever
        _queueLatencies.remove(key);
        _taskLatencies.remove(key);
        _queueLengths.remove(key);
        _utilizations.remove(key);
        _taskArrivalRates.remove(key);
        _taskCompletionRates.remove(key);
        _taskRejectionRates.remove(key);
    }
}
_lock.unlock();

https://github.com/clj-commons/dirigiste/blob/master/src/io/aleph/dirigiste/Pool.java#L331-L332

It does its job when the conditions are met. I was putting the system under stress here but it might be somewhere along those lines of code where some objects are not removed from the queues.

https://github.com/clj-commons/aleph/blob/master/src/aleph/http.clj#L321-L375

I would be interested to know if you have the same pattern @Yneth.

Just a hunch: Not a bug on Dirigiste but most likely on the Aleph client.

Aug 15 '22 22:08 arnaudgeiser

And as @Yneth mentioned this:

Also I see uncatched error warning from manifold even though I am using m/catch everywhere, just saying in case it could also be a root cause.

@KingMob : I would be interested to know your opinion about my assumptions.

Aug 15 '22 22:08 arnaudgeiser

@arnaudgeiser Your test case, that crawls a lot of domains, reminds me a lot of:

https://github.com/clj-commons/aleph/issues/394#issuecomment-404400781 in #394
https://github.com/clj-commons/aleph/issues/461#issuecomment-453145969 in #461

As for the objects not being cleaned up:

Dumb question: Manually running the GC didn't clean any of that up, I assume?
If you look here, the connection pool is keyed off the host. This makes sense, since the connection is immediately usable (returning the conn for different hosts would enable us to avoid object creation, but would still require changing fields, and reconnecting), but the consequence is no pooling between hosts. So I'm not surprised to see usage going up. OTOH, in your example, it looks like 10 hosts generated 1.3GB, which seems high. Hmmm.
The code you linked to in aleph.http/request will 99% of the time release, not dispose. Most normally-ended HTTP/1.1 connections are "Connection: keep-alive", so they won't meet the criteria for disposal.
The check for getUtilization(1) is checking the 100th percentile of the utilization stats. This means that all utilizations in the stats time window must be 0. The reservoir size is 4096 items, and if the sample period is the default 10ms, then we should expect it to take ~41s of no usage for the reservoir to become all zeros. If looking at the memory usage too early is the problem, then lowering the sample period may help. It might be worth checking for bugs here, though, I know there have been some utilization calc issues in the past. (clj-commons/dirigiste#23)

Aug 16 '22 07:08 KingMob

OK, I misread a couple things.

Your code is recurring with another 10000 domains. In my tests, this works out to ~770k per domain. Still highish, but there's 5 major maps of domain->reservoir, and each reservoir has an array of 4096 longs/double, so we have 10010 x 5 x 4096 x 8 bytes = ~1.6 GB. 770k per domain is still high, though; I know Dirigiste is tunable, we could probably make it consume less.

I also took a closer look at the utilizations, which is where it gets interesting. With one conn per domain, it's continually computing the utilization level as 2 (if I'm computing Double/longBitsToDouble correctly). In the code, it seems that queueLength is always 1, available is always 0, and objects is always 0. (Not sure what objects represents... I think it's the number of free objects.) Anyway, that means double utilization = 1.0 - ((available - queueLength) / Math.max(1.0, objects)); is always 2.

It feels like there's a bug here. Regardless of whether the conn is released or disposed, you'd think an unused connection shouldn't count towards utilization. I'm not sure if it's an edge artifact of having exactly 1 item per domain, or if this has always been a problem, and that's just where we see it first, when testing zillions of domains.

Aug 16 '22 09:08 KingMob

Just forced it to make 3 conns per domain, and all I got was endless utilization values of 4. I need to set this aside for now.

Even if it's a leak, it mostly affects people talking to many different domains, so I'm not sure how high priority it is.

Aug 16 '22 10:08 KingMob

Do not get me wrong, everything is fine with my test case. I'm only in trouble because I'm overwhelming the pool, but eventually, everything will be garbage collected if I stop doing requests. I just wanted to demonstrate a situation where a lot of memory could be allocated on the heap.

This pattern, which is not something we should expect from a production system is a situation where we could use a bunch of memory.

But again, without a heap dump here, we are stuck doing hypothesis which might be wrong.

Aug 16 '22 18:08 arnaudgeiser

Just forced it to make 3 conns per domain, and all I got was endless utilization values of 4. I need to set this aside for now.

Ah yes, interesting lead here.

Aug 16 '22 18:08 arnaudgeiser

I will spend some time to write unit tests for Dirigiste beause some parts of the code seems to be off, really.

A few examples:

public void sample(double n) {
    int cnt = _count.incrementAndGet();
    if (cnt <= RESERVOIR_SIZE) {
        _values.set(cnt-1, Double.doubleToLongBits(n));
    } else {
        int idx = ThreadLocalRandom.current().nextInt(cnt);
        if (idx < RESERVOIR_SIZE) {
            _values.set(idx, Double.doubleToLongBits(n));
        }
    }
}

We can overflow here

(def atomic-integer (java.util.concurrent.atomic.AtomicInteger. Integer/MAX_VALUE))
(.incrementAndGet atomic-integer) ;; -2147483648
(.incrementAndGet atomic-integer) ;; -2147483647

Then, it will just blow the Reservoir

(def atomic-long-array (java.util.concurrent.atomic.AtomicIntegerArray. 4096))
(.set atomic-long-array -2147483648 0.5) ==> ArrayIndexOutOfBoundsException

Once the reservoir is filled, you have no guarantee the last idx (position 4095) will be set to the current utilization as we randomly get a slot over the 4096 position.

https://github.com/clj-commons/dirigiste/blob/master/src/io/aleph/dirigiste/Stats.java#L61-L71

The last idx (position 4095) is important here because we rely only on this one to determine if the queue is no longer in use.

if (t == 1.0) {
  return (double) vals[cnt-1];
}

https://github.com/clj-commons/dirigiste/blob/master/src/io/aleph/dirigiste/Stats.java#L166-L168

Aug 16 '22 22:08 arnaudgeiser

Yeah, there's some weird stuff going on, with few tests, and a lot of that is why I'm not a fan of Dirigiste.

Based on a sampling period of 10ms, we should expect overflow and crash after ~8 months left untouched. However, the stats are meant to be replaced periodically. For both Executor and Pool, in startControlLoop, the stats reservoirs are replaced once each control loop by updateStats(). For Pool.updateStats, the .toMap() calls are actually destructive(!). It remove()s the keys as it builds the map. So theoretically, overflow isn't an issue, but I find Dirigiste's logic indirect enough that I wouldn't bet on that.

Once the reservoir is filled, you have no guarantee the last idx (position 4095) will be set to the current utilization as we randomly get a slot over the 4096 position.

Actually, it's weirder than that. It picks a random number from 0 to cnt-1, and only if that's < 4096 does it store the sample at all. This has the effect of updating the reservoir less and less frequently as time goes on, until it's reset. The longer the control period, the less and less likely that later samples will have an effect. I'm not sure what the rationale is over using a ring buffer, and the commit messages are no help.

That being said, using the value in the last position when looking at the percentiles is fine. Inside the reservoir's toArray() fns, you'll see it actually returns them sorted (!), so looking at the last element will always be the maximal utilization. Dirigiste isn't consdiering the most recent utilitzation value, it's considering the highest utilization in the control period.

The problem is that the utilization reservoir contains the same value everywhere, so it doesn't matter what percentile we look at. (The behavior above where later samples in the control period are less likely to be recorded is also a problem, but if the stats are properly reset, it should still work once we have a fallow period.)

So either a) the utilization is never being calculated correctly in our scenario (all scenarios?), or b) it's calculated correctly, but the stats aren't being reset, and thus by the time utilization drops to zero, the odds of it being recorded are low, and the odds of all non-zero samples being displaced is vanishingly low. I think it's more likely to be a; if b were true, I should be able to find other values somewhere, but I can't find any.

It's really annoying that .toArray and .toMap aren't simple transformation fns. Sorry you lost time on that.

Aug 17 '22 06:08 KingMob

Thanks for the explanations, I missed so many things here.

For both Executor and Pool, in startControlLoop, the stats reservoirs are replaced once each control loop by updateStats().

Woow... at least that explains the "high" rate allocation I was seeing.

For Pool.updateStats, the .toMap() calls are actually destructive(!). It remove()s the keys as it builds the map.

Right

Dirigiste isn't consdiering the most recent utilitzation value, it's considering the highest utilization in the control period.

Understood, I missed the fact that toArray sorts the values. :face_with_spiral_eyes:

The problem is that the utilization reservoir contains the same value everywhere, so it doesn't matter what percentile we look at. (The behavior above where later samples in the control period are less likely to be recorded is also a problem, but if the stats are properly reset, it should still work once we have a fallow period.)

~This is currently bugging me. There is a correlation with Pool.updateStats but according to what you explain, it's like most of the time the 4096 positions are totally useless. I need to clean my head here.~

Nevermind, got it.

So either a) the utilization is never being calculated correctly in our scenario (all scenarios?), or b) it's calculated correctly, but the stats aren't being reset, and thus by the time utilization drops to zero, the odds of it being recorded are low, and the odds of all non-zero samples being displaced is vanishingly low. I think it's more likely to be a; if b were true, I should be able to find other values somewhere, but I can't find any.

Let's figure this out then!

Aug 17 '22 07:08 arnaudgeiser

Keep in mind, this is still low priority to my way of thinking. The Dirigiste pooling problem only applies to outgoing client connections, where talking to zillions of different servers is unlikely. The server uses Netty to pool incoming connections/channels, so it should be unaffected by Dirigiste's pool stats issue. The most likely scenario is a few outside servers are talked to, and a few MBs are wasted, but that's it.

If you're burning to fix it, I won't stop you tho 😄

Aug 17 '22 07:08 KingMob

Yes, I agree. But some people are currently having issues with the io.aleph.dirigiste.Pool while it's unlikely they are currently talking to a lot of different servers.

By adding tests, I expect to see patterns where it could go wrong without having to communicate to thousands of different domains.

Going back to this PR, I'm still not sold to it but I can see reasons why people don't want to use Dirigiste on the current situation. According to the discussions we just had, it's quite an heavy machine under the hood. Let's provide an escape hatch then!

I know you want to sunset Dirigiste, but I would prefer stabilizing it. The code base is not that huge and the rationale behind it is still good IMO.

Whether we have no news from @Yneth on the following days, I will take over that PR.

Aug 17 '22 09:08 arnaudgeiser

I agree that other memory leaks of Dirigiste should be fixed. I'm just afraid that the pool leak is a red herring for their other issues.

And while I'd like to sunset it, that's more of a "next version" goal. Replacing it in current Aleph makes me think of these lines from the movie State and Main:

... you come by next Sunday, we'll give you a second chance.

Sister, the only second chance I know is the chance to make the same mistake twice.

What kind of escape hatch are you thinking of, if you don't want to use this PR?

Aug 18 '22 07:08 KingMob

I agree that other memory leaks of Dirigiste should be fixed. I'm just afraid that the pool leak is a red herring for their other issues.

I'm also afraid of this!

What kind of escape hatch are you thinking of, if you don't want to use this PR?

We'll use this PR. But I am resigned, not convinced!

Aug 18 '22 07:08 arnaudgeiser

@Yneth Are you still interested in this?

Aug 18 '22 09:08 KingMob

gentlemen excuse for the late response.

Regarding PR, I do not like it. Almost everything I've been thinking of is not very user friendly. Regarding dirigiste, I personally think it would be nice to fix it in a first place.

My use case involves querying up to 1000 hosts that change on daily basis. Think of a service like https://check-host.net/ that tracks statuses of the services every X minutes.

I do not promise, but I will try to provide heap dump later today

Aug 18 '22 09:08 Yneth

factory method of pool is dependant on local variables of the API method, as well as some private functions. Of course there should be an opportunity to override, but in a user friendly way

my temporary solution was to copy paste some parts and roll out my own method creating my own dirigiste pool without stats and it works fine and the PR itself is not high priority for me

Aug 18 '22 09:08 Yneth

@Yneth Don't restrict yourself on where or how to fix this. We're open to the right solution, and if you've followed the conversation so far, the most likely correct place is where Dirigiste computes pool utilization and/or cleans up old pool objects.

Also, based on your initial screenshot, which shows the leak is from the Stats reservoirs, I'm pretty sure Arnaud and I are looking at the same leak, so I don't think we need a dump anymore. (Of course, if that's not your leak, please let us know.)

Aug 18 '22 09:08 KingMob

SyliusResourceBundle SyliusResourceBundle copied to clipboard

add pool customization options

SyliusResourceBundle
SyliusResourceBundle copied to clipboard