hudi
hudi copied to clipboard
[SUPPORT] RO table did not get updated while RT table did
Hi
We are facing an issue when we first doing bulk_insert for batch load which we had 9 million records , we are getting two tables one is RT and other is RO Table for bulk_insert everything looks good , but for delta records when we do upsert we are able to update only one table i.e RT table getting updated but RO table is not able to update , could you please help me on this issue
looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.
for RO table delta commit, It can only be seen after compact action
please find below configuration which we are using currently
hudi_options = { 'hoodie.datasource.write.table.type': self._write_table_type, 'hoodie.table.name': self._table_name, 'hoodie.datasource.write.recordkey.field': self._record_key, 'hoodie.datasource.write.partitionpath.field': self._partition_field, 'hoodie.datasource.write.precombine.field': self._combine_key, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.parquet.max.file.size': "20971520", 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.table': self._table_name.lower(), 'hoodie.datasource.hive_sync.partition_fields': self._partition_field, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.database': self._hive_database.lower(), 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': 'true' }
looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.
please find below configuration which we are using currently
hudi_options = { 'hoodie.datasource.write.table.type': self._write_table_type, 'hoodie.table.name': self._table_name, 'hoodie.datasource.write.recordkey.field': self._record_key, 'hoodie.datasource.write.partitionpath.field': self._partition_field, 'hoodie.datasource.write.precombine.field': self._combine_key, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.parquet.max.file.size': "20971520", 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.table': self._table_name.lower(), 'hoodie.datasource.hive_sync.partition_fields': self._partition_field, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.database': self._hive_database.lower(), 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': 'true' }
looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.
please find below configuration which we are using currently
hudi_options = { 'hoodie.datasource.write.table.type': self._write_table_type, 'hoodie.table.name': self._table_name, 'hoodie.datasource.write.recordkey.field': self._record_key, 'hoodie.datasource.write.partitionpath.field': self._partition_field, 'hoodie.datasource.write.precombine.field': self._combine_key, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.parquet.max.file.size': "20971520", 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.table': self._table_name.lower(), 'hoodie.datasource.hive_sync.partition_fields': self._partition_field, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.database': self._hive_database.lower(), 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': 'true' }
As mentioned by @KnightChess RT and RO tables are synced when you run compaction on Hudi MOR tables. i.e. The Delta (Avro) Files are merged into the Parquet Files. In HUDI COW Tables == Data in Parquet Files MOR Tables == Data in Avro + Parquet Files
You can read the configs from the docs here
Some sample configs you should provide
## Compaction
'hoodie.compact.inline.max.delta.seconds' : 60,
'hoodie.compact.inline.max.delta.commits' : 4,
'hoodie.compact.inline.trigger.strategy' : 'NUM_OR_TIME',
'hoodie.compact.inline' : True,
'hoodie.datasource.compaction.async.enable' : True,
This will trigger compaction after every 60 seconds or 4 delta commits for a streaming job. Read more about what is compaction in Hudi here
@satishmalladi-m as mentioned by @KnightChess and @glory9211 , it's possible that compaction has not run which resulted in RO table not updated. can you confirm if the sync succeeds after compaction ?
@satishmalladi-m @KnightChess @glory9211 : any updates around this.
@satishmalladi-m : any updates please.
analysis and suggestions were provided above. closing due to inactivity