Results 167 comments of BInwei Yang

> > If a partition only has 2 split > > This sounds like a bigger issue, the workload is doomed to be skewed with such a setup No, It's...

> @FelixYBW A split is usually a stripe (row group) if the number of files is not large enough. With one file per split, the split level preloading is not...

> It's usually either 1 split per row group or 1 split per file, how did you generate your data? Hi @Yuhta we can't make any assumption on this in...

> The change itself should be ok; it's just if one split contains many row groups, row group level prefetch would benefit much more than split level prefetch (i.e. the...

@pedroerp can you find someone to review the PR? The PR is essential to Gluten project. It solved a bug Gluten customer observed.

In long term, we need to implement the Spark way. Broadcast hashtable instead of raw table data.

@zhztheplayer Is there memory management issue in this solution? Is the memory allocated in storage memory? @JkSelf will this solution helpful to the final solution?