BInwei Yang comments

Results 167 comments of


                                            BInwei Yang

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

> > Do you convert to Velox format directly? or convert to Arrow then to Velox? > > Convert to Arrow firstly, then to velox make sense. We may upstream...

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

@zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy?

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

Let's document the conversion clearly here. I have a impression that parquet-mr can take use of offheap memory for columnar data. If so the best case is that we can...

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

> does an extra copy even if the row might from an off Thank you for explanation. You may try to enable spark.sql.columnVector.offheap.enabled. onheap to offheap memcpy is more expensive...

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

Oh, just noted the PR is still open and have many conflict. @boneanxs would you like to continue?

[GLUTEN-7641][VL] Add Gluten benchmark scripts

Why there are 3 TPCDS queries set? Can we consolidate to one? ./tools/gluten-it/common/src/main/resources/tpcds-queries ./gluten-core/src/test/resources/tpcds-queries ./gluten-core/target/scala-2.12/test-classes/tpcds-queries

[GLUTEN-7641][VL] Add Gluten benchmark scripts

> Thank you! > > BTW there were a couple of related efforts in our code base (not all of them): > > #432 #5278 > > Should we review...

[GLUTEN-7641][VL] Add Gluten benchmark scripts

initialize.ipynb. Let's remove the BKM section

[GLUTEN-7641][VL] Add Gluten benchmark scripts

Looks good. Let's test on cloud once we have a chance.

Erase the previous processed WindowPartition in RowStreamingWindowBuild

In Velox we track the memory by plan node memory pool or global spill memory pool. In gluten the memory pool for plan node is counted into offheap size, the...