hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] SqlQueryBasedTransformer causes memory issues

Open tzhang-fetch opened this issue 3 years ago • 0 comments

Describe the problem you faced

With a DeltaStreamer job that runs fine before, adding a SqlQueryBasedTransformer that only SELECTs 1 column runs into memory issues.

"--transformer-class", "org.apache.hudi.utilities.transform.SqlQueryBasedTransformer", "--hoodie-conf", "hoodie.deltastreamer.transformer.sql=SELECT a.ATTRIBUTES FROM <SRC> a"

To Reproduce

Steps to reproduce the behavior:

  1. Add SqlQueryBasedTransformer with simple SELECT statement to a DeltaStreamer job
  2. Run job

Expected behavior

Getting back one column from the job, without memory issues

Environment Description

  • Hudi version : 0.10.1

  • Spark version : 3.1.2

  • Hive version : -

  • Hadoop version : 3.1.2

  • Storage (HDFS/S3/GCS..) : Reading from Kafka, storing in S3

  • Running on Docker? (yes/no) : no

Additional context

Some additional screenshots and messages in this slack thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1663698444989499

Stacktrace

│ 2022-09-19T21:45:44.236+0000: [GC (Allocation Failure) [PSYoungGen: 25113K->25029K(2758656K)] 77023K->76946K(8351232K), 0.0177561 secs] [Times: user=0.02 sys=0.02, real=0.02 secs]                        │
│ 2022-09-19T21:45:44.254+0000: [Full GC (Allocation Failure) [PSYoungGen: 25029K->0K(2758656K)] [ParOldGen: 51917K->54295K(5592576K)] 76946K->54295K(8351232K), [Metaspace: 112463K->112463K(1155072K)], 0. │
│ 2022-09-19T21:45:44.378+0000: [GC (Allocation Failure) [PSYoungGen: 0K->0K(2720768K)] 54295K->54295K(8313344K), 0.0035697 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]                                │
│ 2022-09-19T21:45:44.381+0000: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2720768K)] [ParOldGen: 54295K->45261K(5592576K)] 54295K->45261K(8313344K), [Metaspace: 112463K->109953K(1155072K)], 0.1912 │
│ #                                                                                                                                                                                                          │
│ # java.lang.OutOfMemoryError: Java heap space                                                                                                                                                              │
│ # -XX:OnOutOfMemoryError="kill -9 %p"                                                                                                                                                                      │
│ #   Executing /bin/sh -c "kill -9 22"..```



tzhang-fetch avatar Sep 22 '22 19:09 tzhang-fetch