kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard
Recreate hive table after accidental 'drop table'
@confluentinc It looks like @serssp just signed our Contributor License Agreement. :+1:
Always at your service,
clabot
Hi,
I took some times today to test on my side this pull request, actually it looks like it solve our issue. Our issue was more like if you enable autodump without hive integration, and after some times you want to enable it, the connector will fail with the followings errors :
[2019-02-01 15:26:51,949] ERROR Adding Hive partition threw unexpected error (io.confluent.connect.hdfs.TopicPartitionWriter:819)
io.confluent.connect.storage.errors.HiveMetaStoreException: Invalid partition for databasenamse.topicname: time=event/bucket=hourly/date=2019-02-01/hour=15
at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:123)
at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:108)
at io.confluent.connect.storage.hive.HiveMetaStore.doAction(HiveMetaStore.java:98)
at io.confluent.connect.storage.hive.HiveMetaStore.addPartition(HiveMetaStore.java:133)
at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:817)
at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:813)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: InvalidObjectException(message:databasenamse.topicname table not found)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51619)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51596)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result.read(ThriftHiveMetastore.java:51519)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1667)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1651)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:607)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:601)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy53.appendPartition(Unknown Source)
at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:115)
... 9 more
As described in : https://github.com/confluentinc/kafka-connect-hdfs/issues/272 It can be resolved by "deleting all the file on HDFS", this is not really a solution on our side since some connector are dumping datas for month, and apparently it's the same issue for @serssp . It would be really great if we can improve this.
Regarding the PR I would perhaps just move the condition outside of the for since i'm not sure that we need to do this for each topic-partition.
Would it help if someone add unit test that cover this case ? Could we have a review ?