vega remove serialization of duplicate data in dependencies along with task

remove serialization of duplicate data in dependencies along with task

Open rajasekarv opened this issue 4 years ago • 3 comments

Apr 26 '20 15:04 rajasekarv

Hi, when I ran a sample called 'Transitive closure on a graph', the typical sample in Spark https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkTC.scala. I found that the total number of bytes grew too fast to run to completion. Only two or three iterations will exhaust my memory. The problem seems related to this issue. If I want to contribute to it, what's the main problem when solving, and could you please give me some hints?

Feb 02 '21 16:02 AmbitionXiang

Hi, I've finished it. Thanks.

Feb 03 '21 07:02 AmbitionXiang

Hello @AmbitionXiang

Hope you are doing well. Thanks for checking it and bringing out the issue. Yeah, due to data duplication in serialization, it can go out of memory very quickly if the data flow branches out a lot. It is a long-pending issue and since I am busy with personal work, I never got time to work on it. I plan to resume the work on the project in about a month and I will be managing it actively this time. If you have done some work please raise a Pull Request and I will merge it after reviewing it. Thanks a lot for your support

Feb 03 '21 07:02 rajasekarv

vega vega copied to clipboard

remove serialization of duplicate data in dependencies along with task

vega
vega copied to clipboard