vega icon indicating copy to clipboard operation
vega copied to clipboard

remove serialization of duplicate data in dependencies along with task

Open rajasekarv opened this issue 4 years ago • 3 comments

rajasekarv avatar Apr 26 '20 15:04 rajasekarv

Hi, when I ran a sample called 'Transitive closure on a graph', the typical sample in Spark https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkTC.scala. I found that the total number of bytes grew too fast to run to completion. Only two or three iterations will exhaust my memory. The problem seems related to this issue. If I want to contribute to it, what's the main problem when solving, and could you please give me some hints?

AmbitionXiang avatar Feb 02 '21 16:02 AmbitionXiang

Hi, I've finished it. Thanks.

AmbitionXiang avatar Feb 03 '21 07:02 AmbitionXiang

Hello @AmbitionXiang

Hope you are doing well. Thanks for checking it and bringing out the issue. Yeah, due to data duplication in serialization, it can go out of memory very quickly if the data flow branches out a lot. It is a long-pending issue and since I am busy with personal work, I never got time to work on it.  I plan to resume the work on the project in about a month and I will be managing it actively this time. If you have done some work please raise a Pull Request and I will merge it after reviewing it.  Thanks a lot for your support

rajasekarv avatar Feb 03 '21 07:02 rajasekarv