reactor-core
                                
                                 reactor-core copied to clipboard
                                
                                    reactor-core copied to clipboard
                            
                            
                            
                        Uncontrolled Recursion causing StackOverflowError
StackOverflowError. Uncontrolled recursion in reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves()
Error Stack: [java.lang.Iterable.forEach():74 reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves():433 reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1():433 java.lang.Iterable.forEach():75 reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves():433 reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1():433 [...]
Expected Behavior
Stop the recursion after a certain depth and bubble the original error.
Actual Behavior
Stack overflow.
Thank you for the report, can you provide a reproducer to help better understand the circumstances when this happens?
It's not clear how we repro this yet. Saw this error in the logs.
Please reopen if you manage to create a reproducer.
Hi everyone, the same error has been observed today. Unluckly, we are not able to reproduce it.
Stacktrace:
2023-02-23 15:18:38.369 ERROR 1 --- [undedElastic-12] Schedulers () : Scheduler worker in group main failed with an uncaught exception
java.lang.StackOverflowError: null
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.lambda$findPathToLeaves$1(FluxOnAssembly.java:433)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at reactor.core.publisher.FluxOnAssembly$OnAssemblyException.findPathToLeaves(FluxOnAssembly.java:433)
Environment:
- Java version:
openjdk 17.0.6 2023-01-17
OpenJDK Runtime Environment GraalVM CE 22.3.1 (build 17.0.6+10-jvmci-22.3-b13)
OpenJDK 64-Bit Server VM GraalVM CE 22.3.1 (build 17.0.6+10-jvmci-22.3-b13)
- Reactor Core 3.4.26
- OS Linux 5.4.209-116.367.amzn2.x86_64 amd64
- Others:
reactor-netty-1.0.27.jar
spring-boot-2.7.8
spring-webflux-5.3.25.jar
We also see the same error in our framework that uses reactive core 3.4.34. Can we get more insight on this?
If you are able to identify the pipeline that causes this, please share. The stackoverflow might be just a side effect of an incorrectly crafted pipeline, e.g. assemblying a reactive chain using a loop and stacking operators instead of using operators that perform looping. Without a reproducer there's no point in investigating. The code could potentially avoid recursion but that would turn into using a queue-based looping approach, which can cause other issues such as heap pollution, so it won't necessarily resolve the original problem.
@chemicL A little more debugging found that in reactor core. The FluxOnAssembly#findPathToLeaves is a recursive function and is invoked by getMessage(). We see these exceptions thrown only on error i.e. whenever i believe getMessage() is invoked. We were not able to recreate it.
@chemicL I was able to reproduce the issue .
Seems that each time we attempt to print an error form mono.error , reactor core tries to build stacktrace string , by recursively walking up mono chain . in case mono chain is very long , it would end up with stackoverflow error.
the following code snipped can quickly reproduce the error
Mono<Object> mono = Mono.just(new Object());
long i=0;
while(i<10000) {
    mono = mono.flatMap(o -> Mono.just(new Object()));
    i++;
}
return mono.flatMap(o -> Mono.error(new Exception()));
(NOTE - exact iterations count might which trigger stackoverflow error depend on jvm stack size )
adding a configurable limit to the recursion level should provide a quick protection against such issue .
I ran the above and I get a java.lang.StackOverflowError (SOE) with repeated:
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
or
at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:199)
Depending on the loop size chosen (5000, 10_000).
It's not coming from FluxOnAssembly#findPathToLeaves in my case.
In order to get that I used -Dreactor.trace.operatorStacktrace=true and reworked the example:
	public static void main(String[] args) {
		Mono<Object> mono = Mono.error(new RuntimeException("oops"));
		long i = 0;
		int loops = 2850;
		while (i < loops) {
			mono = mono.flatMap(o -> Mono.just(new Object()));
			i++;
		}
		mono.flatMap(o -> Mono.error(new Exception()))
			.doOnError(e -> System.out.println(e.getMessage()))
			.subscribe();
	}
For loops = 2850 sometimes there's no SOE and sometimes it triggers so it seems to be around the threshold.
As I said before, it looks like the FluxOnAssembly#findPathToLeaves SOE is just manifesting itself as a result of something else that is broken and it's a pipeline that is too long. Assembling a pipeline using a loop like the above might not be the best choice. Perhaps it is justified in certain scenarios, but the one presented is not a justification for making adjustments in my view.
adding a configurable limit to the recursion level should provide a quick protection against such issue
Where would this be configurable? I can't imagine anyone actually using that configuration - if they had to, they'd already know what they are dealing with. java.lang.StackOverflowError fires exactly at the right limit -> so should we aim at just one method evaluation before it gets triggered? I think it's unfeasible.
Assuming there is a valid, reasonable reactive pipeline which causes this to be triggered, what would be the expected behaviour?
I reworked the recurrent findPathToLeaves method to use a loop:
	void findPathToLeaves(ObservedAtInformationNode node, List<List<ObservedAtInformationNode>> rootPaths) {
			Queue<ObservedAtInformationNode> items = new LinkedList<>();
			items.add(node);
			do {
				if (node.children.isEmpty()) {
					List<ObservedAtInformationNode> pathForLeaf = new LinkedList<>();
					ObservedAtInformationNode traversed = node;
					while (traversed != null && traversed != root) {
						pathForLeaf.add(0, traversed);
						traversed = traversed.parent;
					}
					rootPaths.add(pathForLeaf);
				}
				items.addAll(node.children);
			} while ((node = items.poll()) != null);
		}
Now the threshold for SOE is around 4190 and comes from the regular reactive operations (request).
Even though it looks like a solution I am not convinced that it makes sense to change the implementation. For this case it resolves the problem but might introduces a cost of allocating a queue backed by heap memory instead of using the stack which could outperform potential optimizations that we could introduce. Although, it might be completely the opposite :)
As this is of low priority, I'd consider introducing this change if a JMH benchmark was provided. Is anyone willing to validate whether this implementation is acceptable for regular pipelines?
@chemicL Just for clarification, what exactly are you referring to long pipeline? So something like this would qualify as a long pipeline? E.g.
Flux.just(someString)
       .map(s->doSomething1())
      .map(s->doSomething2())
      .map(s->doSomething3())
      .map(s->doSomething4())
      .map(s->doSomething5())
  and doSomething1() is a `Mono.just("string").map(s->callSomething()).map(s->callSomething())`
@chemicL thanks for looking into it .
the current state is in case mono chain is too long , the system behaves unexpectedly , as I assume that stackoverflow error would not be a desired behaviour in such case.
As I said before, it looks like the FluxOnAssembly#findPathToLeaves SOE is just manifesting itself as a result of something else that is broken and it's a pipeline that is too long. Assembling a pipeline using a loop like the above might not be the best choice. Perhaps it is justified in certain scenarios, but the one presented is not a justification for making adjustments in my view.
I assume that in real world scenario we can reach such cases due to regular long call graph , common to enterprise apps :
libA -> calls lib B -> calls lib C -> and so on .. 
e.g. in java spring itis  not uncommon to have pretty long call graphs.
Where would this be configurable? I can't imagine anyone actually using that configuration - if they had to, they'd already know what they are dealing with. java.lang.StackOverflowError fires exactly at the right limit -> so should we aim at just one method evaluation before it gets triggered? I think it's unfeasible.
I am guessing some reasonable default value with an optional escape hatch for a user to change it via optional config if someone really knows what he's doing .
most users probably have no need for a 10K lines long stack trace.
Even though it looks like a solution I am not convinced that it makes sense to change the implementation. For this case it resolves the problem but might introduces a cost of allocating a queue backed by heap memory instead of using the stack which could outperform potential optimizations that we could introduce. Although, it might be completely the opposite :)
what would be expected behaviour in such cases. I assume that currently there are no limits enforced on depth of mono chains , so in theory during stack trace creation , we could deal with any length of data .
there should probably be some kind of graceful degradation instead of current unhandled exception . e.g. retuning truncated data in case of getMesssage .
again , thank you for looking into it !
@rohitdumbre86 I assume you mean .flatMap(s -> doSomething1()). That doesn't like like a "long pipeline" - it's 6 operators in the main chain and 3 in the internal one of flatMap. Assuming all the map operators were to be flatMap we have 6*3 = 18 operators, which is reasonable and won't call this issue to surface.
@alex-suchman and @rohitdumbre86
I mean artificially attaching operators in a loop-like construct can be an issue if the number of operators in a chain grows beyond 1000 (depending on the available resources of course it will be a different number, so it's just an approximation).
I'm closing the issue as I don't feel it's justified to make any adjustments. Were you to provide a non-contrived reproducer where the logic is a real-world scenario that works perfectly fine on a particular machine but FluxOnAssembly#findPathToLeaves causes OOM, I think we could consider a contribution to address this issue. However, it is not a priority for the team right now and we won't address it at this time. The contribution should also incorporate a JMH benchmark with results and memory characteristics highlighting the chosen approach is not worse performance-wise from the existing solution.