micrometer
micrometer copied to clipboard
Race condition in registry searching and MeterNotFoundException creation may result in a failure that looks as if it should have succeeded
I think there's a race condition between meter registration and the evaluation of a search and creation of a MeterNotFoundException
. If a thread registers a meter that meets the search's criteria after the search has been performed but before the MeterNotFoundException
is created, the exception will report a failure that looks as if it should have succeeded:
io.micrometer.core.instrument.search.MeterNotFoundException: Unable to find a meter that matches all the requirements at once. Here's what was found: |
-- | --
| OK: A meter with name 'http.server.requests' was found. |
| OK: Meters with type timer were found.
I am not sure if it's related but it's the closest symptom to what got. In my case, if you start a flow programatically and later on retrieve some metric, if the search request happens before the metric binding initialization, it always returns MetricNotFound
@nightswimmings if you can provide a minimal sample project that we can run, we can take a look.
@shakuzen @wilkinsona May I ask what became of this race condition issue? I seem to run into the same problem with the SimpleMeterRegistry in my unit tests. The output looks suspicious:
[error] io.micrometer.core.instrument.search.MeterNotFoundException: Unable to find a meter that matches all the requirements at once. Here's what was found:
[error] OK: A meter with name 'transactions' was found.
[error] OK: A meter with type counter was found.
[error] OK: A meter with name 'transactions' has a tag 'type' with value 'tagged'.
[error] OK: A meter with name 'transactions' has a tag 'result' with value 'success'. (MeterNotFoundException.java:228)
This happens when I try to retrieve the metric from the registry. Sleeping for 50ms before retrieving the metric "fixes" the problem.
The confusing output from MeterNotFoundException hasn't been fixed, but even if it were, you would still have the failure to retrieve the meter in your test; just with a MeterNotFoundException that agrees with the failure. If you are doing something that records the metric and asynchronously checking the metric, using something like Awaitility to check the metric gets recorded eventually is the correct approach. Or block on the thing that will record the metric finishing before trying to check.
@nightswimmings any chance for a reproducer?
Wow, somehow I overlooked this. But trying to figure out what microservice suffered the problem and how did I fixed, I believe I overcome it by disabling MetricsAutconfiguration. I can tell if I remove the exclusion in spring-boot 3.2, it builds fine, so I guess it is fixed (if I am right on the analysis, because we got like 20 microservices with similar arch)
Ok, let's close this issue for now then. If it ever goes back we'll reopen it.
He's a reproducer:
@RepeatedTest(50)
void test() throws InterruptedException {
String meterName = "name";
SimpleMeterRegistry meterRegistry = new SimpleMeterRegistry();
AtomicReference<Exception> failure = new AtomicReference<>();
Thread read = new Thread(() -> {
try {
meterRegistry.get(meterName).timer();
}
catch (Exception ex) {
failure.set(ex);
}
});
Thread write = new Thread(() -> {
meterRegistry.timer("name");
});
write.start();
read.start();
write.join();
read.join();
if (failure.get() != null) {
assertThat(failure.get().getMessage()).doesNotContain("OK");
}
}
Roughly 20% of the 50 iterations fail for me. The failures look like this:
java.lang.AssertionError:
Expecting actual:
"Unable to find a meter that matches all the requirements at once. Here's what was found:
OK: A meter with name 'name' was found.
OK: A meter with type timer was found."
not to contain:
"OK"
at com.example.MicrometerThreadSafety.test(MicrometerThreadSafety.java:52)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:711)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)