BInwei Yang comments

Results 74 comments of


                                            BInwei Yang

Query result not match Vanilla

@zhouyuan @zhixingheyi-tian

Query result not match Vanilla

The same root cause as https://github.com/oap-project/gazelle_plugin/issues/906 We should add ARROW_CHECK for all cases where int16 is used as record batch size

record batch size exceed configured one

The root cuse as https://github.com/oap-project/gazelle_plugin/issues/928

[SHUFFLE] Solve shuffle's small files issue

mmap shows worse performance than read/write. spill write mmap 2.23 4.3 read/write 0.84 1.49 Looks like It's because the difference of page fault handling. write doesn't cause major page fault...

count null count bug

related bug: dfw.repartition(144).count() `0` report error: INFO shuffle.ColumnarShuffleWriter: Skip ColumnarBatch of 32768 rows, 0 cols

count null count bug

dfx.coalesce(1).count() also return 0, Not sure if it's the same issue as repartition

count null count bug

query plan is the same: dfw.repartition(144).where("ss_customer_sk is null").count() dfw.where("ss_customer_sk is null").repartition(144).count()

[NSE856] optimize binary buffer allocation

there is no performance difference if we set PreferSpill=true. because the memory is allocated once only

[OPPRO-277]Add free diskspace step before running velox testing

Can we merge it now?

[DISCUSS] Any ideas about unified memory management for different native engines?

Do you mean to use Spark's memory management system? Then we need to define a set of APIs for native library where they can be implemented based on native implementation....