hudi
hudi copied to clipboard
Avoid creating Configuration copies in Hudi
We observed that creating Configuration copies is consuming a lot of CPU.
We made changes in Presto to use a wrapper instead of creating configuration copies. But we were told that the presto change is breaking Hudi.
So one suggestion is to avoid creating copies in places like this: https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/config/SerializableConfiguration.java#L37
we can instead simply do
public SerializableConfiguration(Configuration configuration) {
this.configuration = configuration;
}
The breaking Presto PR: https://github.com/prestodb/presto/pull/18115 https://github.com/prestodb/presto/issues/17736
@pratyakshsharma @7c00 do you think the code suggestion above makes sense ?
The changed is to be taken in Hudi. @vinothchandar @codope How do you think?
I see the changes were introduced in this commit - https://github.com/apache/hudi/pull/755. Hence tagging @bvaradar here.
The reason for creating copies was to avoid ConcurrentModificationException encountered while long-running deltastreamer jobs. To avoid current modification, table meta client and file system get exclusive copies of the configuration. See https://github.com/apache/hudi/pull/755 for context.
I think avoiding copies may not be that trivial. I understand this might slow down queries via presto-hudi connector but I still want to understand the breaking change in detail. @pratyakshsharma @NikhilCollooru do you have some time this week to sync up and discuss this? I am sagar sumit
on Hudi Slack.
@codope : any updates on this.
@codope Let us connect sometime next week to discuss this. This has been pending for some time.
Synced up with @pratyakshsharma regarding this issue. First of all, the issue affects hudi tables queries via presto-hive connector. We need to see if we can use the config provided by the engine itself while instantiating the meta client. Created HUDI-4974 to track that issue. For now, we have a mitigation. We will unwrap the wrapper config object and pass that.
Closing the issue as it has been triaged and we have an interim solution.