spring-cloud-sleuth
spring-cloud-sleuth copied to clipboard
Reactor fallback operators are not linked with the original request
Describe the bug
Consider a reactive operator in a pipeline that triggers a different code path based on a condition. This could be for instance a timeout or the handling of an exception.
https://github.com/snicoll/demo-tanzu-observability/blob/2c063d459529ba31acfbeafa831cb193b589e307/dashboard/src/main/java/io/spring/sample/dashboard/stats/StatsService.java#L45-L50
When the fallback code runs and it succeed, I expect two things:
- See the span correlated to the original request so that the fallback code is displayed alongside the original request
- If the call succeed, the span should not be in error
Both of these aren't true with 2020.0.3.
Sample The sample is https://github.com/snicoll/demo-tanzu-observability
To reproduce the problem, start the two apps. Then enable some fake latency for the service using http :8081/actuator/latency ratio=0.8. Finally, visit http://localhost:8080.
If you look at the traces, you'll see a bunch of request to reverse-lookup/free/{ip}. Some of them might succeed, some of them are in error. It's then followed, right after the timeout kicks in (500ms) by fallback operations on reverse-lookup/costly/{ip}. These are fallbacks and look like isolated requests.
To reproduce onErrorResume you can disable the latency: http :8081/actuator/latency ratio=0 and then invoke http://localhost:8080 with a high rate (like 10 times in a row).
I wonder if this is not linked to this https://github.com/spring-cloud/spring-cloud-sleuth/issues/1121
that looks quite similar indeed. The difference here is that the fallback is another code path rather than retrying the exact same operation.
Spring Cloud Sleuth is feature complete and out of OSS support.