Keyong Zhou

Results 16 comments of Keyong Zhou

Hi @pedroerp , thanks for the proposal! > Since dictionaries cannot wrap around more than one vector, at times merge join may return fewer than outputBatchSize_ rows. Do you have...

> > Do you have a plan to extend DictionaryVector to wrap more than one vectors, so that in some cases it can avoid too small-sized batches? > > that's...

> @waitinfuture Thank you for the fix. Would you also add a test for aggregations over sorted inputs? @mbasmanova Updated, please take a look again. It's hard to reproduce incorrect...

Hi @qqibrow , we also encountered data incorrectness when reading Map types, wonder if you also met such bugs, thanks!

Hi @s0nskar , thanks for your point, I think you are correct. Seems this PR conflicts with stage rerun. > we should always treat the previous ShuffleMapStage's output as indeterministic...

> @wangshengjie123 Is there any doc or ticket explaining this approach? Also for the sort based approach that you mentioned. The sort based approach is roughly like this: 1. Each...

> It has been a while since I looked at this PR - but as formulated, the split into subranges is deterministic (if it is not, it should be made...

> Ah, I see what you mean ... `PartitionLocation` would change between retries. Yeah, this is a problem then - it will cause data loss. This would be a variant...

> QQ: thinking out aloud, instead of this change - do we want to proactively trigger sort for reducers where we are reading a subset of mapper output (based on...