orc ORC-XXX: Support orc.compression.zstd.workers

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Jan 17 '24 00:01 dongjoon-hyun

Evaluating if this has any benefits in ORC.

Jan 17 '24 00:01 dongjoon-hyun

Thanks @dongjoon-hyun for doing this, this is also what I want to introduce this configuration after zstd-jni merge, like Spark and Parquet also have similar configurations.

Jan 17 '24 04:01 cxzl25

Ya, indeed.

BTW, it seems that there is no perf gain with this so far. Interesting.

Jan 17 '24 04:01 dongjoon-hyun

it seems that there is no perf gain with this so far

Based on the product environment verification of this PR, I tested orc.compression.zstd.workers 0, 6, 15, and 16, and there seems to be no difference.

Although Paruqet also provides options for the number of zstd workers.

https://github.com/apache/parquet-mr/blob/c82d5b471a558124b03e67759038661a046f5938/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/ZstandardCodec.java#L53-L54

https://facebook.github.io/zstd/zstd_manual.html

    ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
                              * When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*() :
                              * ZSTD_compressStream*() consumes input and flush output if possible, but immediately gives back control to caller,
                              * while compression is performed in parallel, within worker thread(s).
                              * (note : a strong exception to this rule is when first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
                              *  in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always a blocking call).
                              * More workers improve speed, but also increase memory usage.
                              * Default value is `0`, aka "single-threaded mode" : no worker is spawned,
                              * compression is performed inside Caller's thread, and all invocations are blocking */

Jan 19 '24 06:01 cxzl25

Thank you for double-check. Ya, it seems that our implementation has some limitations or bug.

Apache Spark also has the ZStandardCodec implementation based on this zstd-jni and it shows 30% or 40% improvement in the micro-bencharmk.

https://github.com/apache/spark/blob/39f8e1a5953b5897f893151d24dc585a80c0c8a0/core/benchmarks/ZStandardBenchmark-results.txt#L27-L47

I'm still digging this because I believe this should be a part of Apache ORC 2.0.0

Jan 19 '24 19:01 dongjoon-hyun