rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Timeout while evaluating FedX-Query

Open keiligch opened this issue 2 years ago • 12 comments

Current Behavior

When using a FedX-Repository: Specific query and graph (see repo linked in "Steps To Repoduce") and ((FedXTupleQuery) query).evaluate().stream().collect(Collectors.toList()); (or ((FedXTupleQuery) query).evaluate().hasNext();) throws org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout caused by java.util.concurrent.TimeoutException

Expected Behavior

providing a correct query result with a List<BindingSet>

Steps To Reproduce

  1. clone FedX-Query-Test-Repo

  2. run Main.main()

  3. get Exception at line 101

  4. (delete lines 8-48 in db.ttl, then the bug will not occur)

Version

4.3.6

Are you interested in contributing a solution yourself?

None

Anything else?

	at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
	at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
	at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
	at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
	at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
	Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
		... 13 more
		Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
			... 13 more
			Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
				... 13 more
				Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
					at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
					at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
					... 3 more
					Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
						at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
						at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
						at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
						at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
						at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
						at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
						at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
						... 3 more
						Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
							... 13 more
							Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
								at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
								at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
								... 3 more
								Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
									at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
									at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
									at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
									at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
									at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
									at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
									at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
									... 3 more
									Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
										... 13 more
										Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
											... 13 more
											Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
												... 13 more
												Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
													... 13 more
													Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
														... 13 more
														Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
															at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
															at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
															... 3 more
															Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
																at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
																at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
																at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
																at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
																at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
																at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
																at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
																... 3 more
															Caused by: java.util.concurrent.TimeoutException
																at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
																at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
																at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
																... 11 more
														Caused by: java.util.concurrent.TimeoutException
															at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
															at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
															at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
															at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
															... 3 more
													Caused by: java.util.concurrent.TimeoutException
														at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
														at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
														at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
														at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
														... 11 more
												Caused by: java.util.concurrent.TimeoutException
													at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
													at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
													at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
													at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
													... 11 more
											Caused by: java.util.concurrent.TimeoutException
												at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
												at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
												at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
												at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
												... 11 more
										Caused by: java.util.concurrent.TimeoutException
											at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
											at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
											at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
											at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
											... 11 more
									Caused by: java.util.concurrent.TimeoutException
										at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
										at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
										at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
										at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
										... 11 more
								Caused by: java.util.concurrent.TimeoutException
									at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
									at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
									at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
									... 11 more
							Caused by: java.util.concurrent.TimeoutException
								at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
								at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
								at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
								at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
								... 3 more
						Caused by: java.util.concurrent.TimeoutException
							at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
							at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
							at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
							at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
							... 11 more
					Caused by: java.util.concurrent.TimeoutException
						at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
						at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
						at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
						... 11 more
				Caused by: java.util.concurrent.TimeoutException
					at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
					at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
					at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
					at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
					... 3 more
			Caused by: java.util.concurrent.TimeoutException
				at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
				at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
				at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
				at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
				... 11 more
		Caused by: java.util.concurrent.TimeoutException
			at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
			at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
			at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
			at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
			... 11 more
	Caused by: java.util.concurrent.TimeoutException
		at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
		at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
		at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
		at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
		... 11 more
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
	at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
	at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
	... 11 more

Process finished with exit code 1

keiligch avatar Sep 28 '23 11:09 keiligch

Did you try adjusting the timeout?

https://rdf4j.org/documentation/programming/federation/

hmottestad avatar Sep 28 '23 13:09 hmottestad

Yes, but 120 seconds timeout seems more than sufficient for the small graph and relatively simple query to evaluate. The other queries were evaluated in fractions of a second.

keiligch avatar Sep 28 '23 13:09 keiligch

Does it still timeout when setting it to 0?

hmottestad avatar Sep 28 '23 20:09 hmottestad

Yes, it throws after about 20s the TimeoutException and the QueryInterruptedException

keiligch avatar Sep 29 '23 06:09 keiligch

Does ist also happen with NativeStore or LmdbStore?

kenwenzel avatar Sep 29 '23 07:09 kenwenzel

Yes, with both. No difference

keiligch avatar Sep 29 '23 07:09 keiligch

Try setting the timeout directly on the Query object. Like this query.setMaxExecutionTime(0);.

Can you also update your example code to show how you are setting the timeout on the FedX repo?

hmottestad avatar Sep 29 '23 12:09 hmottestad

I had the timeout setting as you wrote on the Query object and did not change it globally. Also the example code repo is updated

keiligch avatar Sep 29 '23 13:09 keiligch

were you able to reproduce with my example code?

keiligch avatar Sep 29 '23 13:09 keiligch

I haven't had a chance to test it out yet. Might be a little while before I have time.

hmottestad avatar Sep 29 '23 22:09 hmottestad

I've created a test to show that the issue only happens against remote endpoints. If EndpointClassification.Local is used then the queries are working as expected.

I also added some "debug" output to ControlledWorkerUnion but I'm not sure if this helps to fix the problem.

@aschwarte10 do you have some time to look into this issue? I think it is critical as it leads to deadlocks when used with remote endpoints.

kenwenzel avatar Oct 25 '23 12:10 kenwenzel

Thanks for reporting the issue. It looks like you have stumbled indeed into a deadlock scenario (which fortunately at least recovers through timeout exceptions)

Unfortunately I only have very limited time available to work on RDF4J.

However, I tried to do a bit of an assessment, and have some findings to nail it down (but do not yet have an understanding of the cause). I have also pushed a PR to reproduce it with just 40 triples, so ideally this allows to debug it easier.

Some insights to share:

The issue occurs in a join with an NUnion node, where the statements of the union themselves are relevant at multiple sources.

An example FedX query plan looks like this:

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "label"
      NJoin
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
            StatementSource (id=repo1, type=REMOTE)
         NUnion
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_9285ccfc_uri, value=http://www.w3.org/2000/01/rdf-schema#label, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)

Note: due to the NUnion the join is executed in ControlledWorkerJoin. (i.e. no vectored / bind join evaluation).

It looks like the scheduler executing the unions is populated in some order that causes the deadlock, i.e. the results cannot be piped through. Might also be the main executor that blocks.

Note: the design idea is that everything scheduled in the UNION scheduler can executed and does not have dependencies, while the join scheduler may have dependencies and block.

If somebody finds the time to look further, please go ahead. I will try to look further once my time allows

aschwarte10 avatar Oct 31 '23 09:10 aschwarte10