Zhen Li

Results 14 issues of Zhen Li

Problem When there are a large number of rows with the same key in the build side, the `listJoinResults` function becomes very time-consuming. Design `appendNextRow` Create a next-row-vector if it...

CLA Signed

**Problem** Memory leaks may occur when the split preloading feature is enabled, either the connector thread pool is busy or the task fails or is cancelled. We've observed instances of...

CLA Signed

Add normalize_nan Spark function. In Spark's optimizer, `NormalizeNaNAndZero` are added for aggregations to normalize -0.0 / 0.0 and different NaN. In Velox, we don't need to handle 0.0 & -0.0,...

CLA Signed

Doc: https://spark.apache.org/docs/latest/api/sql/#rint Code: https://github.com/apache/spark/blob/da92293f9ce0be1ac283c4a5d769af550abf7031/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L743

CLA Signed
ready-to-merge

Doc: https://spark.apache.org/docs/latest/api/sql/index.html#levenshtein Code:https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L2220C12-L2220C23 https://github.com/apache/spark/blob/d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L1694C5-L1694C77 There are two differences between Spark implementation and Presto's implementation: one is that Spark's return type is `int32_t`, and the other is that it accepts a...

CLA Signed
ready-to-merge

Apply the prefetching optimization for join probe to function 'insertForJoin' to improve it's performance. Fixes: #9732

CLA Signed

## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) (Fixes: \#ISSUE-ID) ## How was this patch tested? (Please explain how this patch...

## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) (Fixes: \#ISSUE-ID) ## How was this patch tested? (Please explain how this patch...

Add __restrict annotations on the inputs to aid in auto-vectorization to speed Spark comparison functions. Store the result in a `std::vector` and then convert it to the result vector using...

CLA Signed

Doc: https://docs.databricks.com/en/sql/language-manual/functions/collect_set.html Code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala#L39C16-L39C23 https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala#L147C12-L147C22 There are 3 semantic difference from `set_agg`: 1. Null values are excluded. ``` import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val jsonStr = """{"txn":null}""" val jsonStr1 = """{"txn":null}"""...

CLA Signed