kyuubi [Umbrella] Improvements and evaluation for TRowSet generation of Spark Engine

[Umbrella] Improvements and evaluation for TRowSet generation of Spark Engine

Open bowenliang123 opened this issue 1 year ago • 0 comments

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

Describe the proposal

RowSet generation is that taking the results from result iterator and serializing them into column-based or row-based TRowSet, which is the key point for transportation and performance in most common cases.

It's been reported possibility drawbacks in looping the result iterator by wrapped stream in SparkOperation (https://github.com/apache/kyuubi/blob/master/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala#L253C21-L253C26)
the performance of Spark Engine's RowSet.toTRowSet should be evaluated by benchmarks, for overall performance and for each data type with different mode of Thrift version and arrow based.
Performance Improvements in Spark Engine's RowSet implementation
Code cleanup in Spark Engine's RowSet generation

Task list

[ ] benchmark ut
- #5809
  - [ ] Add benchmark unit test for RowSet generation covering supported data types
  - [ ] Add benchmark dedicated unit test for each supported data type for RowSet generation
[ ] Replace looping the iterator from toSeq (toStream of Iterator) to immutable collection
- [ ] Compare toStream/toSeq/toList/toVector
- ~~#5804~~
[ ] Parallel processing for column-based TRowSet generation
[ ] Performance improvements in data types
- #5811
- ~~DecimalType with column-based mode #5810~~
- ~~ArrayType of primitive data types with column-based mode~~
Generalize TRowSet generator
- #5851
- #5861
[ ] Code cleanup in RowSet of Spark Engine
- #5831

Are you willing to submit PR?

[X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
[ ] No. I cannot submit a PR at this time.

Dec 03 '23 08:12 bowenliang123

kyuubi kyuubi copied to clipboard

[Umbrella] Improvements and evaluation for TRowSet generation of Spark Engine

Code of Conduct

Search before asking

Describe the proposal

Task list

Are you willing to submit PR?

kyuubi
kyuubi copied to clipboard