micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Race condition in registry searching and MeterNotFoundException creation may result in a failure that looks as if it should have succeeded

Open wilkinsona opened this issue 4 years ago • 8 comments

I think there's a race condition between meter registration and the evaluation of a search and creation of a MeterNotFoundException. If a thread registers a meter that meets the search's criteria after the search has been performed but before the MeterNotFoundException is created, the exception will report a failure that looks as if it should have succeeded:

io.micrometer.core.instrument.search.MeterNotFoundException: Unable to find a meter that matches all the requirements at once. Here's what was found: |  
-- | --
  | OK: A meter with name 'http.server.requests' was found. |  
  | OK: Meters with type timer were found.

wilkinsona avatar Oct 26 '20 10:10 wilkinsona

I am not sure if it's related but it's the closest symptom to what got. In my case, if you start a flow programatically and later on retrieve some metric, if the search request happens before the metric binding initialization, it always returns MetricNotFound

nightswimmings avatar Jul 06 '21 20:07 nightswimmings

@nightswimmings if you can provide a minimal sample project that we can run, we can take a look.

shakuzen avatar Jul 16 '21 07:07 shakuzen

@shakuzen @wilkinsona May I ask what became of this race condition issue? I seem to run into the same problem with the SimpleMeterRegistry in my unit tests. The output looks suspicious:

[error]  io.micrometer.core.instrument.search.MeterNotFoundException: Unable to find a meter that matches all the requirements at once. Here's what was found:
[error]     OK: A meter with name 'transactions' was found.
[error]     OK: A meter with type counter was found.
[error]     OK: A meter with name 'transactions' has a tag 'type' with value 'tagged'.
[error]     OK: A meter with name 'transactions' has a tag 'result' with value 'success'. (MeterNotFoundException.java:228)

This happens when I try to retrieve the metric from the registry. Sleeping for 50ms before retrieving the metric "fixes" the problem.

poohsen avatar Oct 28 '21 11:10 poohsen

The confusing output from MeterNotFoundException hasn't been fixed, but even if it were, you would still have the failure to retrieve the meter in your test; just with a MeterNotFoundException that agrees with the failure. If you are doing something that records the metric and asynchronously checking the metric, using something like Awaitility to check the metric gets recorded eventually is the correct approach. Or block on the thing that will record the metric finishing before trying to check.

shakuzen avatar Oct 29 '21 04:10 shakuzen

@nightswimmings any chance for a reproducer?

marcingrzejszczak avatar Dec 22 '23 11:12 marcingrzejszczak

Wow, somehow I overlooked this. But trying to figure out what microservice suffered the problem and how did I fixed, I believe I overcome it by disabling MetricsAutconfiguration. I can tell if I remove the exclusion in spring-boot 3.2, it builds fine, so I guess it is fixed (if I am right on the analysis, because we got like 20 microservices with similar arch)

nightswimmings avatar Dec 22 '23 23:12 nightswimmings

Ok, let's close this issue for now then. If it ever goes back we'll reopen it.

marcingrzejszczak avatar Jan 02 '24 09:01 marcingrzejszczak

He's a reproducer:

@RepeatedTest(50)
void test() throws InterruptedException {
	String meterName = "name";
	SimpleMeterRegistry meterRegistry = new SimpleMeterRegistry();
	AtomicReference<Exception> failure = new AtomicReference<>();
	Thread read = new Thread(() -> {
		try {
			meterRegistry.get(meterName).timer();
		}
		catch (Exception ex) {
			failure.set(ex);
		}
	});
	Thread write = new Thread(() -> {
		meterRegistry.timer("name");
	});
	write.start();
	read.start();
	write.join();
	read.join();
	if (failure.get() != null) {
		assertThat(failure.get().getMessage()).doesNotContain("OK");
	}
}

Roughly 20% of the 50 iterations fail for me. The failures look like this:

java.lang.AssertionError: 
Expecting actual:
  "Unable to find a meter that matches all the requirements at once. Here's what was found:
   OK: A meter with name 'name' was found.
   OK: A meter with type timer was found."
not to contain:
  "OK"

	at com.example.MicrometerThreadSafety.test(MicrometerThreadSafety.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:711)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

wilkinsona avatar Jan 02 '24 14:01 wilkinsona