hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Avoid creating Configuration copies in Hudi

Open NikhilCollooru opened this issue 2 years ago • 4 comments

We observed that creating Configuration copies is consuming a lot of CPU.

We made changes in Presto to use a wrapper instead of creating configuration copies. But we were told that the presto change is breaking Hudi.

So one suggestion is to avoid creating copies in places like this: https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/config/SerializableConfiguration.java#L37

we can instead simply do

  public SerializableConfiguration(Configuration configuration) {
    this.configuration = configuration;
  }

The breaking Presto PR: https://github.com/prestodb/presto/pull/18115 https://github.com/prestodb/presto/issues/17736

@pratyakshsharma @7c00 do you think the code suggestion above makes sense ?

NikhilCollooru avatar Aug 08 '22 17:08 NikhilCollooru

The changed is to be taken in Hudi. @vinothchandar @codope How do you think?

7c00 avatar Aug 09 '22 02:08 7c00

I see the changes were introduced in this commit - https://github.com/apache/hudi/pull/755. Hence tagging @bvaradar here.

pratyakshsharma avatar Aug 10 '22 09:08 pratyakshsharma

The reason for creating copies was to avoid ConcurrentModificationException encountered while long-running deltastreamer jobs. To avoid current modification, table meta client and file system get exclusive copies of the configuration. See https://github.com/apache/hudi/pull/755 for context.

codope avatar Aug 10 '22 09:08 codope

I think avoiding copies may not be that trivial. I understand this might slow down queries via presto-hudi connector but I still want to understand the breaking change in detail. @pratyakshsharma @NikhilCollooru do you have some time this week to sync up and discuss this? I am sagar sumit on Hudi Slack.

codope avatar Aug 10 '22 09:08 codope

@codope : any updates on this.

nsivabalan avatar Aug 27 '22 20:08 nsivabalan

@codope Let us connect sometime next week to discuss this. This has been pending for some time.

pratyakshsharma avatar Sep 30 '22 09:09 pratyakshsharma

Synced up with @pratyakshsharma regarding this issue. First of all, the issue affects hudi tables queries via presto-hive connector. We need to see if we can use the config provided by the engine itself while instantiating the meta client. Created HUDI-4974 to track that issue. For now, we have a mitigation. We will unwrap the wrapper config object and pass that.

Closing the issue as it has been triaged and we have an interim solution.

codope avatar Oct 03 '22 15:10 codope