BInwei Yang
BInwei Yang
> > Do you convert to Velox format directly? or convert to Arrow then to Velox? > > Convert to Arrow firstly, then to velox make sense. We may upstream...
@zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy?
Let's document the conversion clearly here. I have a impression that parquet-mr can take use of offheap memory for columnar data. If so the best case is that we can...
> does an extra copy even if the row might from an off Thank you for explanation. You may try to enable spark.sql.columnVector.offheap.enabled. onheap to offheap memcpy is more expensive...
Oh, just noted the PR is still open and have many conflict. @boneanxs would you like to continue?
Why there are 3 TPCDS queries set? Can we consolidate to one? ./tools/gluten-it/common/src/main/resources/tpcds-queries ./gluten-core/src/test/resources/tpcds-queries ./gluten-core/target/scala-2.12/test-classes/tpcds-queries
> Thank you! > > BTW there were a couple of related efforts in our code base (not all of them): > > #432 #5278 > > Should we review...
initialize.ipynb. Let's remove the BKM section
Looks good. Let's test on cloud once we have a chance.
In Velox we track the memory by plan node memory pool or global spill memory pool. In gluten the memory pool for plan node is counted into offheap size, the...