Sanskar Modi comments

Results 40 comments of


                                            Sanskar Modi

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

@wangshengjie123 Is there any doc or ticket explaining this approach? Also for the sort based approach that you mentioned.

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

From my understanding, in this PR we're diverting from vanilla spark approach based on mapIndex and just dividing the full partition into multiple sub-partition based on some heuristics. I'm new...

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

@pan3793 This does not become problem if we are maintaining the concept of mapIndex ranges as spark will always read deterministic output for each sub-partition. As vanilla spark always read...

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

Also, I think this issue would not be only limited to ResultStage, this can happen with ShuffleMapStage as well in some complex cases. Consider another scenario – `ShuffleMapStage1 -----> ShuffleMapStage2...

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

Thanks a lot @waitinfuture for the sort based approach description. > Is it possible to force make it as indeterministic? IMO this would be very difficult to do it from...

[WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files

> a) If recomputation happens, we should fail the stage and not allow retries - this will prevent data loss. > b) We should recommend enabling replication to leverage this...

UnicodeDecodeError when setting tag units with unit of data array

This error occurs generally when given data does not follow proper `utf-8` encoding, so you can take a look that data you are providing contains proper `utf-8` charset. I think...

UnicodeDecodeError when setting tag units with unit of data array

Hey @achilleas-k, i was trying to run `tag = blk.create_tag('TestTag', 'Test', position=[10])` from above code but it's giving me this error. ```python ArgumentError Traceback (most recent call last) in ()...

Minor typo in the NIXPY API documentation

Isn't it weird in [overview.html](https://github.com/G-Node/nixpy/blob/master/docs/source/overview.rst#L24) Line 24 it is written `import nixio as nix` but on [API documentation](http://g-node.github.io/nixpy/overview.html) it is showing `import nix`?

[Spark 3] RSS performance with Adaptive Skew Join Optimization

cc: @hiboyang for viz