Scanner.setTimeout() does not work
Describe the bug
TimeoutIT has method testBatchScannerTimeout for testing the timeout on a BatchScanner. I created a similar test, but using a Scanner instead. The test fails, the Scanner does not time out.
To Reproduce
public void testScannerTimeout(AccumuloClient client, String tableName) throws Exception {
client.tableOperations().create(tableName);
try (BatchWriter bw = client.createBatchWriter(tableName)) {
Mutation m = new Mutation("r1");
m.put("cf1", "cq1", "v1");
m.put("cf1", "cq2", "v2");
m.put("cf1", "cq3", "v3");
m.put("cf1", "cq4", "v4");
bw.addMutation(m);
}
try (Scanner s = client.createScanner(tableName)) {
s.setRange(new Range());
// should not timeout
for (Entry<Key,Value> entry : s) {
entry.getKey();
}
s.setTimeout(5, TimeUnit.SECONDS);
IteratorSetting iterSetting = new IteratorSetting(100, SlowIterator.class);
iterSetting.addOption("sleepTime", 2000 + "");
s.addScanIterator(iterSetting);
assertThrows(TimedOutException.class, () -> s.iterator().next(),
"scanner did not time out");
}
}
Expected behavior Scanner should time out like the BatchScanner does
I've looked into this a bit and from what I can tell the timeout set here only gets applied to the thrift scan. I've created a branch and added some logs and it looks like the timeout will trigger an exception if the thrift scan duration is longer than the timeout. Below are some logs from my branch that (might) help illustrate this:
2022-04-08T11:57:13,978 [functional.TimeoutIT] DEBUG: NOW ADDING 2 SECOND TIMEOUT TO SCANNER ---------------
2022-04-08T11:57:13,978 [functional.TimeoutIT] DEBUG: Scanner timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ScannerImpl] DEBUG: ScannerImpl timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:14,986 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:14,987 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.001s Timeout: 2s
2022-04-08T11:57:15,990 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:15,991 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:16,995 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:16,995 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:17,998 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:17,999 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:19,002 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:19,003 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:20,006 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:20,007 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.001s Timeout: 2s
2022-04-08T11:57:21,010 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:21,011 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:21,989 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
I'm not sure if this is the expected behavior and the slow iterator attached to the scanner just does not slow this scan down, or if this timeout should also be applied elsewhere and be triggered on some other condition as well.
Here is a link to the diff of a branch I added these logs to. I refactored TimeoutIT so that the tests can be run independently (which may be a good standalone improvement).
So you are thinking that the timeout does work and the SlowIterator is not working in this scenario?
So you are thinking that the timeout does work and the SlowIterator is not working in this scenario?
It looks like the duration between individual scans is slowed down due to the SlowIterator but the speed of the scans themselves is not changed. It looks like the timeout is only applied to the duration of an individual scan in this case.
for (Entry<Key,Value> entry : s) {
// Each iteration of this loop is slowed down but the speed/duration of the scan executed is not
var ignored = entry.getKey();
}
So in this case, the timeout is only applied to the part inside the loop which is not slowed down at all. It seems like BatchScanner differs in this case because it looks like the timeout is applied to the whole operation including the slowed iteration.
Once this ticket is a addressed the @Disabled tag should be removed from the following test: https://github.com/apache/accumulo/blob/c0e94cc33f39587df415ca189e90dc20be468dca/test/src/main/java/org/apache/accumulo/test/ScanServerIT.java#L175-L178
@dlmarion This is marked as a blocker for 2.1. Is that still accurate? From the discussion, it looks like it does work, but not as expected.
No, punt to next release
The javadoc in ScannerBase says that the timeout is a retry timeout.
This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.
In testing, with the SlowIterator, with a Scanner, there is no failure that triggers the retry timeout, so there Timeout exception is not thrown. The calling in code ScanIterator will just keep looping, trying to get a row. Once the sleep expires in SlowIterator, the row is returned and the loop restarts.
With a scanner, setting a batch timeout will control how long it will take to gather results - but it does not time out - presumably because the iterator is "still making progress" wrt the scan.
The timeout is a timeout for thrift calls, not a scan timeout. With a scanner, setting a batch timeout will control how long it will take to gather results - but it does not time out - presumably because the iterator is "still making progress" wrt the scan. Not sure that the timeout can be tested directly - would need to mock or somehow slow down thrift calls to trigger the retry.