giraph
giraph copied to clipboard
Improve memory usage of SendWorkerOneMessageToManyRequest
https://issues.apache.org/jira/browse/GIRAPH-1190
The current implementation takes incoming messages stored as ByteArrayOneMessageToManyIds and prepares them as a map from partition id to a ByteArrayVertexIdMessages, which holds the messages for the corresponding partition. It then adds these to the message store.
However, it is possible that these intermediate lists of message get big before they are added to the message store. If they reach the capacity of the underlying buffers, the job fails. This can be avoided if we push these lists to the message store before the get big. This is mostly beneficial when we use a combiner in which case the message store keeps only one value per vertex.
Tests:
- Unit tests
- Compared performance on a large dataset and it is similar to the existing implementation.
- Verified that job that would otherwise fail, it succeeds.
Not sure why the commits show Graph-1185. Do you have to rebase?
I accidentally created a commit with the same message as the previous commit. When this lands, it'll get a new commit message anyway.