rubix icon indicating copy to clipboard operation
rubix copied to clipboard

Exception in getCacheStatus

Open wishnick opened this issue 6 years ago • 1 comments

Setup: Rubix Version : 0.3.1 Presto : 0.172 Emr AMI: 4.9.3

I've installed rubix into presto by using a custom build of presto and overriding the configuration in presto's HdfsConfigurationUpdater to point to CachingPrestoS3FileSystem so it looks like the following:

        config.set("fs.s3.impl", CachingPrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", CachingPrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", CachingPrestoS3FileSystem.class.getName());

When I run it just like this I get the following errors in the presto logs:

2018-04-17T17:49:01.109Z        INFO    20180417_174858_00063_fngsp.1.0-0-54    com.qubole.rubix.spi.RetryingBookkeeperClient   Error while connecting :
org.apache.thrift.shaded.TApplicationException: getCacheStatus failed: unknown result
        at com.qubole.rubix.spi.BookKeeperService$Client.recv_getCacheStatus(BookKeeperService.java:109)
        at com.qubole.rubix.spi.BookKeeperService$Client.getCacheStatus(BookKeeperService.java:87)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.access$001(RetryingBookkeeperClient.java:30)
        at com.qubole.rubix.spi.RetryingBookkeeperClient$1.call(RetryingBookkeeperClient.java:53)
        at com.qubole.rubix.spi.RetryingBookkeeperClient$1.call(RetryingBookkeeperClient.java:48)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.retryConnection(RetryingBookkeeperClient.java:84)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.getCacheStatus(RetryingBookkeeperClient.java:47)
        at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:305)
        at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:231)
        at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:185)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
        at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
        at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:203)
        at com.facebook.presto.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:179)
        at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:99)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:256)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:308)
        at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:239)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:542)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:234)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)



2018-04-17T17:49:01.110Z        INFO    20180417_174858_00062_fngsp.1.0-0-60    com.qubole.rubix.core.CachingInputStream        Could not get cache status from server org.apache.thrift.shaded.TException
        at com.qubole.rubix.spi.RetryingBookkeeperClient.retryConnection(RetryingBookkeeperClient.java:95)
        at com.qubole.rubix.spi.RetryingBookkeeperClient.getCacheStatus(RetryingBookkeeperClient.java:47)
        at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:305)
        at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:231)
        at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:185)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
        at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
        at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
        at com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:203)
        at com.facebook.presto.hive.HiveRecordCursor.advanceNextPosition(HiveRecordCursor.java:179)
        at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:99)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:256)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:308)
        at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:239)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:542)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:234)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I am able to make this go away by changing the following lines in PrestoClusterManager:

this becomes:

return ImmutableList.of(InetAddress.getLocalHost().getHostAddress());

and i added the following after this

if (!hosts.contains(InetAddress.getLocalHost().getHostAddress())) {
  hosts.add(InetAddress.getLocalHost().getHostAddress());
}

wishnick avatar Apr 17 '18 20:04 wishnick

@wishnick Is it possible to share the logs of bookkeeper daemon. That will include the actual cause of the exception. Presto client log will just say the exception is happening in getCacheStatus.

abhishekdas99 avatar Apr 24 '18 20:04 abhishekdas99