Gaffer
Gaffer copied to clipboard
HDFS integration test failure when using Accumulo 2.0.1 with Kerberos
Describe the bug
When running the AddElementsFromHdfsLoaderIT
integration tests (which are run as part of the Accumulo Store) against an Accumulo 2.0.1 cluster configured to use Kerberos, these tests fail with an Accumulo error relating to being unable to "rename files across volumes". The full message is shown below and originates in this line of Accumulo code.
It isn't clear if the error is correct and if the rename really is across volumes or not. This error only occurs with Accumulo 2.0.1 and Kerberos. It doesn't occur with Accumulo 1.9.3, nor with 2.0.1 without Kerberos. The rename is the same regardless (see additional info), which may indicate a problem with Accumulo itself. This rename should either be allowed in all situations or in none, using Kerberos or not shouldn't have any effect.
To debug this issue, investigate the FileSystem
objects here (set with this method, which uses code calling this method).
Ideally any Gaffer code changes should be done after #2743 which is about improving the documentation for these tests.
Expected behaviour These tests should work for Accumulo 2+ the same as they do for Accumulo 1.
Stack trace and errors
Log extract from integration tests container (irrelevant info removed):
2023-12-12 11:47:58 integration.loader.AddElementsFromHdfsLoaderIT$AddElementsFromHdfsLoader INFO - using root dir: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843
...
2023-12-12 11:47:59 accumulostore.utils.TableUtils INFO - Creating table integrationTestGraph as user gaffer/[email protected]
...
2023-12-12 11:48:01 hdfs.handler.AddElementsFromHdfsHandler INFO - Checking that the correct HDFS directories exist
2023-12-12 11:48:01 hdfs.handler.AddElementsFromHdfsHandler INFO - Ensuring output directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir doesn't exist
...
2023-12-12 11:48:01 job.tool.AddElementsFromHdfsTool INFO - Adding elements from HDFS
...
2023-12-12 11:48:01 job.factory.AccumuloAddElementsFromHdfsJobFactory INFO - Creating splits file in location hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/splitsDir/splits from table integrationTestGraph
...
2023-12-12 11:48:03 hadoop.mapred.MapTask INFO - Processing split: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/inputDir3/file.txt:0+21422
...
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO - Ensuring failure directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir exists
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO - Failure directory doesn't exist so creating: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir
2023-12-12 11:48:06 accumulostore.utils.IngestUtils INFO - Setting permission rwxrwxrwx on directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir and all files within
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO - Removing file hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir/_SUCCESS
2023-12-12 11:48:06 accumulostore.utils.IngestUtils INFO - Setting permission rwxrwxrwx on directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir and all files within
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO - Importing files in hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir to table integrationTestGraph
...
2023-12-12 11:48:06 hdfs.handler.AddElementsFromHdfsHandler ERROR - Failed to import elements into Accumulo: Internal error processing waitForFateOperation
Stack trace for the above:
uk.gov.gchq.gaffer.operation.OperationException: Failed to import elements into Accumulo
at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.AddElementsFromHdfsHandler.importElements(AddElementsFromHdfsHandler.java:234)
...
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForFateOperation
at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:388)
at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:342)
at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1599)
at org.apache.accumulo.core.clientImpl.TableOperationsImpl.importDirectory(TableOperationsImpl.java:1207)
at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.job.tool.ImportElementsToAccumuloTool.run(ImportElementsToAccumuloTool.java:78)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:95)
at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.AddElementsFromHdfsHandler.importElements(AddElementsFromHdfsHandler.java:230)
... 119 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForFateOperation
Log extract from accumulo-master
container:
2023-12-12 11:48:06,899 [thrift.ProcessFunction] ERROR: Internal error processing waitForFateOperation
java.lang.UnsupportedOperationException: Cannot rename files across volumes: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir/part-r-00000.rf -> hdfs://hdfs-namenode.gaffer:9000/accumulo/tables/2/b-000003z/I0000040.rf
at org.apache.accumulo.server.fs.VolumeManagerImpl.rename(VolumeManagerImpl.java:319)
at org.apache.accumulo.master.tableOps.bulkVer1.BulkImport.lambda$prepareBulkImport$0(BulkImport.java:251)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:750)
Platform
- Gaffer Version: 2.0.0
Additional context
Looking at the namenode
logs it's possible to see the relevant files being created and renamed by the HDFS integration tests. This also shows how Accumulo makes a new directory, looks at the directory of the file to import, but then (having issued the error above) removes the new directory it created:
2023-12-12 11:48:04 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=create src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0/part-r-00000.rf dst=null perm=gaffer:supergroup:rw-r--r-- proto=rpc
...
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0 dst=null perm=null proto=rpc
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir dst=null perm=null proto=rpc
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=listStatus src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0 dst=null perm=null proto=rpc
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf dst=null perm=null proto=rpc
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=rename src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0/part-r-00000.rf dst=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf perm=gaffer:supergroup:rw-r--r-- proto=rpc
2023-12-12 11:48:05 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=delete src=/tmp/junit8641700344797179843/outputDir/_temporary dst=null perm=null proto=rpc
...
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=setPermission src=/tmp/junit8641700344797179843/outputDir dst=null perm=gaffer:supergroup:rwxrwxrwx proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=listStatus src=/tmp/junit8641700344797179843/outputDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=setPermission src=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf dst=null perm=gaffer:supergroup:rwxrwxrwx proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=getfileinfo src=/tmp/junit8641700344797179843/failureDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.26.0.2 cmd=listStatus src=/tmp/junit8641700344797179843/failureDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=getfileinfo src=/tmp/junit8641700344797179843/failureDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=listStatus src=/tmp/junit8641700344797179843/failureDir dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=mkdirs src=/accumulo/tables/2 dst=null perm=accumulo:supergroup:rwxr-xr-x proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=getfileinfo src=/accumulo/tables/2/b-000003z dst=null perm=null proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=mkdirs src=/accumulo/tables/2/b-000003z dst=null perm=accumulo:supergroup:rwxr-xr-x proto=rpc
2023-12-12 11:48:06 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=listStatus src=/tmp/junit8641700344797179843/outputDir dst=null perm=null proto=rpc
2023-12-12 11:48:07 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.26.0.7 cmd=delete src=/accumulo/tables/2 dst=null perm=null proto=rpc
These are the expected operations (taken from an ITs run using Accumulo 1.9.3) showing the successful renaming of the file by accumulo:
2023-12-12 13:30:46 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.20.0.3 cmd=setPermission src=/tmp/junit8460692230461269805/outputDir/part-r-00000.rf dst=null perm=gaffer:supergroup:rwxrwxrwx proto=rpc
2023-12-12 13:30:46 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.20.0.3 cmd=getfileinfo src=/tmp/junit8460692230461269805/outputDir dst=null perm=null proto=rpc
2023-12-12 13:30:46 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.20.0.3 cmd=getfileinfo src=/tmp/junit8460692230461269805/failureDir dst=null perm=null proto=rpc
2023-12-12 13:30:46 INFO audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS) ip=/172.20.0.3 cmd=listStatus src=/tmp/junit8460692230461269805/failureDir dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=getfileinfo src=/tmp/junit8460692230461269805/failureDir dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=listStatus src=/tmp/junit8460692230461269805/failureDir dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=mkdirs src=/accumulo/tables/2 dst=null perm=accumulo:supergroup:rwxr-xr-x proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=getfileinfo src=/accumulo/tables/2/b-000005c dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=mkdirs src=/accumulo/tables/2/b-000005c dst=null perm=accumulo:supergroup:rwxr-xr-x proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=listStatus src=/tmp/junit8460692230461269805/outputDir dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=rename src=/tmp/junit8460692230461269805/outputDir/part-r-00000.rf dst=/accumulo/tables/2/b-000005c/I000005d.rf perm=gaffer:supergroup:rwxrwxrwx proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.7 cmd=listStatus src=/accumulo/tables/2/b-000005c dst=null perm=null proto=rpc
...
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.10 cmd=open src=/accumulo/tables/2/b-000005c/I000005d.rf dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.10 cmd=getfileinfo src=/accumulo/tables/2/b-000005c/I000005d.rf dst=null perm=null proto=rpc
2023-12-12 13:30:47 INFO audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS) ip=/172.20.0.10 cmd=contentSummary src=/accumulo/tables/2/b-000005c/I000005d.rf dst=null perm=null proto=rpc