kafka-connect-hdfs Recreate hive table after accidental 'drop table'

Jan 10 '19 09:01 serssp

@confluentinc It looks like @serssp just signed our Contributor License Agreement. :+1:

Always at your service,

clabot

Jan 10 '19 09:01 ghost

Hi,

I took some times today to test on my side this pull request, actually it looks like it solve our issue. Our issue was more like if you enable autodump without hive integration, and after some times you want to enable it, the connector will fail with the followings errors :

[2019-02-01 15:26:51,949] ERROR Adding Hive partition threw unexpected error (io.confluent.connect.hdfs.TopicPartitionWriter:819)
io.confluent.connect.storage.errors.HiveMetaStoreException: Invalid partition for databasenamse.topicname: time=event/bucket=hourly/date=2019-02-01/hour=15
	at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:123)
	at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:108)
	at io.confluent.connect.storage.hive.HiveMetaStore.doAction(HiveMetaStore.java:98)
	at io.confluent.connect.storage.hive.HiveMetaStore.addPartition(HiveMetaStore.java:133)
	at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:817)
	at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:813)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: InvalidObjectException(message:databasenamse.topicname table not found)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51619)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51596)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result.read(ThriftHiveMetastore.java:51519)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1667)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1651)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:607)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:601)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
	at com.sun.proxy.$Proxy53.appendPartition(Unknown Source)
	at io.confluent.connect.storage.hive.HiveMetaStore$1.call(HiveMetaStore.java:115)
	... 9 more

As described in : https://github.com/confluentinc/kafka-connect-hdfs/issues/272 It can be resolved by "deleting all the file on HDFS", this is not really a solution on our side since some connector are dumping datas for month, and apparently it's the same issue for @serssp . It would be really great if we can improve this.

Regarding the PR I would perhaps just move the condition outside of the for since i'm not sure that we need to do this for each topic-partition.

Would it help if someone add unit test that cover this case ? Could we have a review ?

Feb 01 '19 15:02 RossierFl

kafka-connect-hdfs kafka-connect-hdfs copied to clipboard

Recreate hive table after accidental 'drop table'

kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard