giraph icon indicating copy to clipboard operation
giraph copied to clipboard

Improve memory usage of SendWorkerOneMessageToManyRequest

Open dlogothetis opened this issue 6 years ago • 2 comments

https://issues.apache.org/jira/browse/GIRAPH-1190

The current implementation takes incoming messages stored as ByteArrayOneMessageToManyIds and prepares them as a map from partition id to a ByteArrayVertexIdMessages, which holds the messages for the corresponding partition. It then adds these to the message store.

However, it is possible that these intermediate lists of message get big before they are added to the message store. If they reach the capacity of the underlying buffers, the job fails. This can be avoided if we push these lists to the message store before the get big. This is mostly beneficial when we use a combiner in which case the message store keeps only one value per vertex.

Tests:

  • Unit tests
  • Compared performance on a large dataset and it is similar to the existing implementation.
  • Verified that job that would otherwise fail, it succeeds.

dlogothetis avatar Apr 26 '18 15:04 dlogothetis

Not sure why the commits show Graph-1185. Do you have to rebase?

yukselakinci avatar May 01 '18 18:05 yukselakinci

I accidentally created a commit with the same message as the previous commit. When this lands, it'll get a new commit message anyway.

dlogothetis avatar May 01 '18 20:05 dlogothetis