accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Scanner.setTimeout() does not work

Open dlmarion opened this issue 3 years ago • 4 comments

Describe the bug TimeoutIT has method testBatchScannerTimeout for testing the timeout on a BatchScanner. I created a similar test, but using a Scanner instead. The test fails, the Scanner does not time out.

To Reproduce

  public void testScannerTimeout(AccumuloClient client, String tableName) throws Exception {
    client.tableOperations().create(tableName);

    try (BatchWriter bw = client.createBatchWriter(tableName)) {
      Mutation m = new Mutation("r1");
      m.put("cf1", "cq1", "v1");
      m.put("cf1", "cq2", "v2");
      m.put("cf1", "cq3", "v3");
      m.put("cf1", "cq4", "v4");
      bw.addMutation(m);
    }

    try (Scanner s = client.createScanner(tableName)) {
      s.setRange(new Range());

      // should not timeout
      for (Entry<Key,Value> entry : s) {
        entry.getKey();
      }

      s.setTimeout(5, TimeUnit.SECONDS);
      IteratorSetting iterSetting = new IteratorSetting(100, SlowIterator.class);
      iterSetting.addOption("sleepTime", 2000 + "");
      s.addScanIterator(iterSetting);

      assertThrows(TimedOutException.class, () -> s.iterator().next(),
          "scanner did not time out");
    }
  }

Expected behavior Scanner should time out like the BatchScanner does

dlmarion avatar Apr 01 '22 14:04 dlmarion

I've looked into this a bit and from what I can tell the timeout set here only gets applied to the thrift scan. I've created a branch and added some logs and it looks like the timeout will trigger an exception if the thrift scan duration is longer than the timeout. Below are some logs from my branch that (might) help illustrate this:

2022-04-08T11:57:13,978 [functional.TimeoutIT] DEBUG: NOW ADDING 2 SECOND TIMEOUT TO SCANNER ---------------
2022-04-08T11:57:13,978 [functional.TimeoutIT] DEBUG: Scanner timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ScannerImpl] DEBUG: ScannerImpl timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:13,979 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:14,986 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:14,987 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.001s Timeout: 2s
2022-04-08T11:57:15,990 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:15,991 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:16,995 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:16,995 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:17,998 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:17,999 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:19,002 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:19,003 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:20,006 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:20,007 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.001s Timeout: 2s
2022-04-08T11:57:21,010 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s
2022-04-08T11:57:21,011 [clientImpl.ThriftScanner] INFO : Duration of scan: 0.0s Timeout: 2s
2022-04-08T11:57:21,989 [clientImpl.ScannerIterator] DEBUG: Starting thrift scan with timeout: 2s

I'm not sure if this is the expected behavior and the slow iterator attached to the scanner just does not slow this scan down, or if this timeout should also be applied elsewhere and be triggered on some other condition as well.

Here is a link to the diff of a branch I added these logs to. I refactored TimeoutIT so that the tests can be run independently (which may be a good standalone improvement).

DomGarguilo avatar Apr 08 '22 16:04 DomGarguilo

So you are thinking that the timeout does work and the SlowIterator is not working in this scenario?

dlmarion avatar Apr 08 '22 16:04 dlmarion

So you are thinking that the timeout does work and the SlowIterator is not working in this scenario?

It looks like the duration between individual scans is slowed down due to the SlowIterator but the speed of the scans themselves is not changed. It looks like the timeout is only applied to the duration of an individual scan in this case.

        for (Entry<Key,Value> entry : s) {
          // Each iteration of this loop is slowed down but the speed/duration of the scan executed is not
          var ignored = entry.getKey();
        }

So in this case, the timeout is only applied to the part inside the loop which is not slowed down at all. It seems like BatchScanner differs in this case because it looks like the timeout is applied to the whole operation including the slowed iteration.

DomGarguilo avatar Apr 08 '22 16:04 DomGarguilo

Once this ticket is a addressed the @Disabled tag should be removed from the following test: https://github.com/apache/accumulo/blob/c0e94cc33f39587df415ca189e90dc20be468dca/test/src/main/java/org/apache/accumulo/test/ScanServerIT.java#L175-L178

DomGarguilo avatar Aug 09 '22 19:08 DomGarguilo

@dlmarion This is marked as a blocker for 2.1. Is that still accurate? From the discussion, it looks like it does work, but not as expected.

ctubbsii avatar Oct 20 '22 02:10 ctubbsii

No, punt to next release

dlmarion avatar Oct 20 '22 02:10 dlmarion

The javadoc in ScannerBase says that the timeout is a retry timeout.

 This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.

In testing, with the SlowIterator, with a Scanner, there is no failure that triggers the retry timeout, so there Timeout exception is not thrown. The calling in code ScanIterator will just keep looping, trying to get a row. Once the sleep expires in SlowIterator, the row is returned and the loop restarts.

With a scanner, setting a batch timeout will control how long it will take to gather results - but it does not time out - presumably because the iterator is "still making progress" wrt the scan.

EdColeman avatar Jan 24 '23 23:01 EdColeman

The timeout is a timeout for thrift calls, not a scan timeout. With a scanner, setting a batch timeout will control how long it will take to gather results - but it does not time out - presumably because the iterator is "still making progress" wrt the scan. Not sure that the timeout can be tested directly - would need to mock or somehow slow down thrift calls to trigger the retry.

EdColeman avatar Mar 08 '23 17:03 EdColeman