Addax
Addax copied to clipboard
reader和writer为kerberos认证HDFS或Hive时,writer端的kerberos认证会失败
Contact Details(联系人)
No response
What happened?
Hive、HDFS作为只作为reader/writer其中之一,kerberos认证正常,但如果是两套kerberos认证,比如说reader为hdfs,writer为hdfs,writer端的kerberos认证就会报错
Version
4.0.9 (Default)
OS Type
Linux (Default)
Java JDK Version
Oracle JDK 1.8.0
Relevant log output
No response
谢谢反馈 请给出完整的配置以及终端输出
当前 master 分支进行测试,暂时无此问题,可以尝试升级到最新版本进行测试
2022-08-28 14:05:01.925 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-08-28 14:05:01.935 [ main] INFO Engine -
{
"setting":{
"speed":{
"channel":2,
"bytes":-1
}
},
"content":{
"reader":{
"name":"hdfsreader",
"parameter":{
"column":[
{
"index":0,
"type":"string"
},
{
"index":1,
"type":"int"
},
{
"index":2,
"type":"string"
},
{
"index":3,
"type":"boolean"
},
{
"index":4,
"type":"string"
}
],
"path":"/tmp/input",
"defaultFS":"hdfs://example",
"fileType":"orc",
"encoding":"UTF-8",
"fieldDelimiter":",",
"haveKerberos":"true",
"kerberosKeytabFilePath":"/etc/security/keytabs/hive.headless.keytab",
"kerberosPrincipal":"[email protected]",
"hadoopConfig":{
"dfs.nameservices":"example",
"dfs.ha.namenodes.lczq":"nn1,nn2",
"dfs.namenode.rpc-address.example.nn1":"nn01.example.com:8020",
"dfs.namenode.rpc-address.example.nn2":"nn02.example.com:8020",
"dfs.client.failover.proxy.provider.lczq":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
}
}
},
"writer":{
"name":"hdfswriter",
"parameter":{
"defaultFS":"hdfs://example",
"fileType":"orc",
"path":"/tmp/output",
"fileName":"addax.dat",
"column":[
{
"name":"col1",
"type":"string"
},
{
"name":"col2",
"type":"int"
},
{
"name":"col3",
"type":"string"
},
{
"name":"col4",
"type":"boolean"
},
{
"name":"col5",
"type":"string"
}
],
"writeMode":"overwrite",
"fieldDelimiter":"\u0001",
"compress":"SNAPPY",
"haveKerberos":"true",
"kerberosKeytabFilePath":"/etc/security/keytabs/hive.headless.keytab",
"kerberosPrincipal":"[email protected]",
"hadoopConfig":{
"dfs.nameservices":"example",
"dfs.ha.namenodes.lczq":"nn1,nn2",
"dfs.namenode.rpc-address.example.nn1":"nn01.example.com:8020",
"dfs.namenode.rpc-address.example.nn2":"nn02.example.com:8020",
"dfs.client.failover.proxy.provider.lczq":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
}
}
}
}
}
2022-08-28 14:05:01.955 [ main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-08-28 14:05:01.955 [ main] INFO JobContainer - Addax jobContainer starts job.
2022-08-28 14:05:01.956 [ main] INFO JobContainer - Set jobId = 0
2022-08-28 14:05:02.028 [ job-0] INFO HdfsReader$Job - init() begin...
2022-08-28 14:05:02.581 [ job-0] WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-08-28 14:05:02.829 [ job-0] INFO UserGroupInformation - Login successful for user [email protected] using keytab file /etc/security/keytabs/hive.headless.keytab
2022-08-28 14:05:02.835 [ job-0] INFO HdfsReader$Job - init() ok and end...
2022-08-28 14:05:03.180 [ job-0] WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-08-28 14:05:03.229 [ job-0] INFO UserGroupInformation - Login successful for user [email protected] using keytab file /etc/security/keytabs/hive.headless.keytab
2022-08-28 14:05:04.123 [ job-0] INFO JobContainer - Addax Reader.Job [hdfsreader] do prepare work .
2022-08-28 14:05:04.123 [ job-0] INFO HdfsReader$Job - prepare(), start to getAllFiles...
2022-08-28 14:05:04.123 [ job-0] INFO DFSUtil - get HDFS all files in path = [/tmp/input]
2022-08-28 14:05:05.175 [ job-0] INFO DFSUtil - [hdfs://example/tmp/input/addax.dat_20220828_135633_627_qu0vasyb.orc]是[ORC]类型的文件, 将该文件加入source files列表
2022-08-28 14:05:05.175 [ job-0] INFO HdfsReader$Job - 您即将读取的文件数为: [1], 列表为: [[hdfs://example/tmp/input/addax.dat_20220828_135633_627_qu0vasyb.orc]]
2022-08-28 14:05:05.176 [ job-0] INFO JobContainer - Addax Writer.Job [hdfswriter] do prepare work .
2022-08-28 14:05:05.304 [ job-0] INFO JobContainer - Job set Channel-Number to 2 channel(s).
2022-08-28 14:05:05.304 [ job-0] INFO HdfsReader$Job - split() begin...
2022-08-28 14:05:05.318 [ job-0] INFO JobContainer - Addax Reader.Job [hdfsreader] splits to [1] tasks.
2022-08-28 14:05:05.318 [ job-0] INFO HdfsWriter$Job - begin splitting ...
2022-08-28 14:05:05.324 [ job-0] INFO HdfsWriter$Job - split wrote file name:[/tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c/addax.dat_20220828_140505_324_f4qedh1u.orc]
2022-08-28 14:05:05.325 [ job-0] INFO HdfsWriter$Job - end splitting.
2022-08-28 14:05:05.325 [ job-0] INFO JobContainer - Addax Writer.Job [hdfswriter] splits to [1] tasks.
2022-08-28 14:05:05.351 [ job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2022-08-28 14:05:05.361 [ taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-08-28 14:05:05.364 [ taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-08-28 14:05:05.365 [ taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2022-08-28 14:05:05.390 [0-0-0-reader] INFO UserGroupInformation - Login successful for user [email protected] using keytab file /etc/security/keytabs/hive.headless.keytab
2022-08-28 14:05:05.390 [0-0-0-writer] INFO UserGroupInformation - Login successful for user [email protected] using keytab file /etc/security/keytabs/hive.headless.keytab
2022-08-28 14:05:05.392 [0-0-0-reader] INFO HdfsReader$Task - read start
2022-08-28 14:05:05.393 [0-0-0-reader] INFO HdfsReader$Task - reading file : [hdfs://example/tmp/input/addax.dat_20220828_135633_627_qu0vasyb.orc]
2022-08-28 14:05:05.393 [0-0-0-reader] INFO DFSUtil - Start Read orc-file [hdfs://example/tmp/input/addax.dat_20220828_135633_627_qu0vasyb.orc].
2022-08-28 14:05:05.394 [0-0-0-writer] INFO HdfsWriter$Task - write to file : [/tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c/addax.dat_20220828_140505_324_f4qedh1u.orc]
2022-08-28 14:05:05.442 [0-0-0-writer] INFO HadoopShimsPre2_7 - Can't get KeyProvider for ORC encryption from hadoop.security.key.provider.path.
2022-08-28 14:05:05.455 [0-0-0-writer] INFO PhysicalFsWriter - ORC writer created for path: /tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c/addax.dat_20220828_140505_324_f4qedh1u.orc with stripeSize: 67108864 blockSize: 268435456 compression: Compress: SNAPPY buffer: 262144
2022-08-28 14:05:05.501 [0-0-0-reader] INFO OrcCodecPool - Got brand-new codec SNAPPY
2022-08-28 14:05:05.618 [0-0-0-reader] INFO ReaderImpl - Reading ORC rows from hdfs://example/tmp/input/addax.dat_20220828_135633_627_qu0vasyb.orc with {include: null, offset: 0, length: 9223372036854775807, schema: struct<col1:string,col2:int,col3:string,col4:boolean,col5:string>, includeAcidColumns: true}
2022-08-28 14:05:05.668 [0-0-0-reader] INFO HdfsReader$Task - end read source files...
2022-08-28 14:05:05.844 [0-0-0-writer] INFO WriterImpl - ORC writer created for path: /tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c/addax.dat_20220828_140505_324_f4qedh1u.orc with stripeSize: 67108864 options: Compress: SNAPPY buffer: 262144
2022-08-28 14:05:06.048 [0-0-0-writer] INFO HdfsWriter$Task - end do write
2022-08-28 14:05:08.371 [ job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2022-08-28 14:05:08.371 [ job-0] INFO JobContainer - Addax Writer.Job [hdfswriter] do post work.
2022-08-28 14:05:08.378 [ job-0] INFO HdfsHelper - start move file [hdfs://example/tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c/addax.dat_20220828_140505_324_f4qedh1u.orc] to dir [ttt].
2022-08-28 14:05:08.382 [ job-0] INFO HdfsHelper - finish move file(s).
2022-08-28 14:05:08.382 [ job-0] INFO HdfsHelper - start delete tmp dir [/tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c] .
2022-08-28 14:05:08.387 [ job-0] INFO HdfsHelper - finish delete tmp dir [/tmp/ttt/.8e9277a1_8ff0_4d0c_b706_ec4d83153b9c] .
2022-08-28 14:05:08.388 [ job-0] INFO JobContainer - Addax Reader.Job [hdfsreader] do post work.
2022-08-28 14:05:08.391 [ job-0] INFO JobContainer - PerfTrace not enable!
2022-08-28 14:05:08.393 [ job-0] INFO StandAloneJobContainerCommunicator - Total 10 records, 390 bytes | Speed 130B/s, 3 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-08-28 14:05:08.394 [ job-0] INFO JobContainer -
任务启动时刻 : 2022-08-28 14:05:01
任务结束时刻 : 2022-08-28 14:05:08
任务总计耗时 : 6s
任务平均流量 : 130B/s
记录写入速度 : 3rec/s
读出记录总数 : 10
读写失败总数 : 0