advancedxy comments

Results 103 comments of


                                            advancedxy

[#1796] fix(spark): Implicitly unregister map output on fetch failure

Thanks for working on this. I did a quick overview about this change, I think it's quite large to review. It would best to keep this pr open and split...

[#1796] fix(spark): Implicitly unregister map output on fetch failure

> Emm... Sorry I don't think this is a huge change, this is mostly based on your previous great work, just fix some bugs. I agree it's not huge, but...

[#1796] fix(spark): Implicitly unregister map output on fetch failure

> Could you help show what’s the difference between fetch and write failure? For starters, you should report shuffle failure and write failure via different request types. The stage retry...

Rework stage retry

> Could you help review this? @advancedxy I want to go on to finish this feature. Thanks for ping me. I maybe able to review your design and PRs this...

[#1628] Avoid exception caused by calling release multiple times

Hmm, I'm not sure about this change. If we can find the root cause why it's called multiple times and can reason that the multiple released is necessary, this change...

[FEATURE] Cache index files on the server side

Do you observe any performance issue for this case? If the index file is only accessed few times(say 1,2 times), there's no need to cache this file. And normally, the...

[FEATURE] Cache index files on the server side

> We can just cache the partitions of applicaitions which enable AQE AQE is enabled by default in later spark versions such as Spark 3.3 and also it may turned...

[FEATURE] Cache index files on the server side

> > Also another question, in which cases AQE would trigger reading the same partition many times? I know the `OptimizeSkewedJoin` would split the same partition into multiple parts, and...

[FEATURE] Cache index files on the server side

> Yes. Reduce task will read all index files of all partitions. You can try to run a job and trace the logs. This doesn't seem right. Do you have...

[FEATURE] Cache index files on the server side

> https://github.com/apache/spark/blob/0fd1f85f16502d1d4222cf3d7abfd5e2b86464e6/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala#L228 I didn't get a change to reproduce this case. I will do it in this week. I still think it should be cached when necessary and selectively. >...