Junfan Zhang comments

Results 434 comments of


                                            Junfan Zhang

[FEATURE] Speed up the registering on reassign

In the case of partition reassign, this will block the rpc response. It should be acted with thread pool.

[FEATURE] zero-copy for reading when integrating gluten

when using the netty mode, the rpc layer's bytebuf is `io.netty.buffer.CompositeByteBuf` , when invoking with the `bytebuf.nioBuffer` , it will converted into the heap buf

[FEATURE] Dedicated retry times on request assignment when partition reassign

> Hi [@zuston](https://github.com/zuston) > > Do we need to add a new configuration option for a retry logic with backoff and assign it to `GrpcClient`? > > https://github.com/apache/uniffle/blob/1e48bc673d1c0ee41f889a0de6192b0fab131467/common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java Yes. This...

fix(spark): Strict check processed blockIds

If we disable the multi replicas of shuffle-data, we should strict check the processed blockIds. WDYT? @jerqi

fix(spark): Strict check processed blockIds

> > If we disable the multi replicas of shuffle-data, we should strict check the processed blockIds. WDYT? @jerqi > > Does it cover the speculation execution and AQE? I...

[FEATURE] Trigger partition split when the shuffle-server is unhealthy due to the insufficient capacity

I have implemented in the riffle side. https://github.com/zuston/riffle/pull/532

[#2596] feat(spark): Introduce fory serializer

cc @chaokunyang . If you have time, could you help review this integration with Fory? So far, this implementation hasn’t shown significant improvements. I would greatly appreciate any guidance you...

[#2596] feat(spark): Introduce fory serializer

Big thanks for your quick and patient review. @chaokunyang > Shuffle data should already be binary, is there anything that needs being serialized? If using the vanilla spark, record is...

[#2596] feat(spark): Introduce fory serializer

> Only if you are using spark rdd with raw java objects, there will be serialization bottleneck. Such cases are similiar to datastream in flink. We've observed several times of...

[#2596] feat(spark): Introduce fory serializer

> Data record in Spark SQL are alreay binary, there is no serialization happened. I suggest benchmark first before optimizing. It seems that serialization is still happening. https://github.com/apache/spark/blob/2de0248071035aa94818386c2402169f6670d2d4/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala#L57 The product2...