advancedxy
advancedxy
Thanks for working on this. I did a quick overview about this change, I think it's quite large to review. It would best to keep this pr open and split...
> Emm... Sorry I don't think this is a huge change, this is mostly based on your previous great work, just fix some bugs. I agree it's not huge, but...
> Could you help show what’s the difference between fetch and write failure? For starters, you should report shuffle failure and write failure via different request types. The stage retry...
> Could you help review this? @advancedxy I want to go on to finish this feature. Thanks for ping me. I maybe able to review your design and PRs this...
Hmm, I'm not sure about this change. If we can find the root cause why it's called multiple times and can reason that the multiple released is necessary, this change...
Do you observe any performance issue for this case? If the index file is only accessed few times(say 1,2 times), there's no need to cache this file. And normally, the...
> We can just cache the partitions of applicaitions which enable AQE AQE is enabled by default in later spark versions such as Spark 3.3 and also it may turned...
> > Also another question, in which cases AQE would trigger reading the same partition many times? I know the `OptimizeSkewedJoin` would split the same partition into multiple parts, and...
> Yes. Reduce task will read all index files of all partitions. You can try to run a job and trace the logs. This doesn't seem right. Do you have...
> https://github.com/apache/spark/blob/0fd1f85f16502d1d4222cf3d7abfd5e2b86464e6/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala#L228 I didn't get a change to reproduce this case. I will do it in this week. I still think it should be cached when necessary and selectively. >...