SyliusResourceBundle icon indicating copy to clipboard operation
SyliusResourceBundle copied to clipboard

Add more reliable resource leak detector for tests

Open DerGuteMoritz opened this issue 1 year ago • 6 comments

See docstrings in aleph.resource-leak-detector for details.

This patch also includes the following changes:

  • Rename the :leak-level-paranoid Leiningen profile to :leak-detection since it now also sets everything up to use the new custom detector.
  • Introduce new test selector :leak for tests which intentionally leak (not run by default since they require the :leak-detection profile to be active, too).
  • Introduce intentionally leaking aleph.http-test/test-leak-in-raw-stream-handler as an example and to document a case where users are responsible for releasing buffers.
  • Eliminate all (netty/leak-detector-level! :paranoid) calls from tests. Use new leak detection API instead (see below).
  • Place (aleph.resource-leak-detector/instrument-tests!) at the end of all relevant test namespaces to detect leaks (only effective when running with :leak-detection profile).

Ideally we'd activate it in the CI test job but we first need to fix the leak in aleph.http-test/test-idle-timeout for that (see #707).

Based on the research results from #615.

DerGuteMoritz avatar Feb 09 '24 13:02 DerGuteMoritz

Hmm aleph.http-timeout-test/test-shutdown-timeout-1 fails consistently in CI for me while it works fine locally. Looks like there is a race condition there where the client has not yet established a connection before the server shuts down after the 50ms wait? I can't think of a way to make this more reliable, though... @pyr Since you are the author, do you have an idea?

FWIW I also pushed a branch with only a bogus change to trigger the job and this one also fails so it's not related to any change in this here patch.

DerGuteMoritz avatar Feb 09 '24 13:02 DerGuteMoritz

@DerGuteMoritz Nothing jumps at me, works locally here as well

pyr avatar Feb 09 '24 14:02 pyr

Do you want to reevaluate after merging #709 ?

KingMob avatar Feb 11 '24 12:02 KingMob

Well, that fixed it but now the DNS failure tests started timing out, as well (also on master). See https://github.com/clj-commons/aleph/pull/710 for an improvement.

DerGuteMoritz avatar Feb 11 '24 15:02 DerGuteMoritz

OK all green now :sweat_drops: Please give it a try!

DerGuteMoritz avatar Feb 12 '24 15:02 DerGuteMoritz

I'm not sure it will fly long-term (the strategy around calling the GC - since it's just a hint. And manually running the finalizations).

@arnaudgeiser Now that you mention it: running finalizers should actually not be needed, I only threw runFinalization in there for good measure. Netty does use a handful of finalizers but the leak detector uses weak references and a reference queue instead. I think leaving it in there for now doesn't hurt for now but having to remove it shouldn't be a problem either - if it ever gets removed completely, Netty will have to adjust as well, anyway. And by then Netty 5 migh thave rolled around which AFAIK gets rid of reference counting anyway :shrug:

But it improves the current situation and it's just for testing purpose. Let's see what will be the future.

Right, improving the current testing situation is the main goal here :+1:

Thanks for the approach!

You're welcome and thanks for your input on the original ticket :pray:

https://openjdk.org/jeps/421

Ah, good to know, wasn't aware of this!

DerGuteMoritz avatar Feb 15 '24 15:02 DerGuteMoritz