incubator-nemo icon indicating copy to clipboard operation
incubator-nemo copied to clipboard

[NEMO-472] Implement Intermediate Combine

Open Kangji opened this issue 4 years ago • 6 comments

JIRA: NEMO-472: Fix and Implement Hierarchical Aggregation

Major changes: [NEMO-472: Implement Hierarchical Aggregation] aims to add additional intermediate accumulation operator in front of final combine operator that accumulates data among physically nearby containers prior to shuffling across WAN, when needed. It is expected that data aggregation among nearby containers will reduce the data size that must be transferred across WAN. To achieve it,

  • Implemented intermediate combine transform
    • Previous Combine.PerKey Transform consisted of 2 steps.
      1. Partial Combine(a.k.a. pre-aggregation): accumulates elements in each containers. Therefore, data transfer across network is not needed in this step.
      2. Final Combine: shuffle all data(hashed by key) and then combine.
    • Additional, and optional step that accumulates the pre-aggregated data partially(only among nearby containers) is implemented and inserted between 1(partial) and 2(final).
    • This new type of transform is only used in intermediate accumulator vertex, which is special type of operator vertex.
  • Added new type of communication channel, Partial Shuffle, which represents data transfer from upstream operator to intermediate accumulator vertex. It resembles shuffle, but the difference is that data shuffle occurs only among physically nearby containers.
  • Implemented compile time optimization pass that inserts intermediate accumulator vertex, which performs hierarchical aggregation prior to shuffle, only when it is expected to be effective.
  • Implemented unit tests.

Minor changes to note:

  • None

Tests for the changes:

  • Tested on my Mac and ubuntu machine

Other comments:

  • Data transfer on partial shuffle communication channel is implemented in #319.
  • [TODO] Need more conditions to be implemented to make decision whether applying the pass is effective or not. Current logic is too naive.

Kangji avatar Aug 15 '21 06:08 Kangji

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Aug 30 '21 06:08 sonarqubecloud[bot]

@taegeonum Thanks for the review! I've addressed your comments.

Kangji avatar Aug 30 '21 07:08 Kangji

@Kangji Any update?

taegeonum avatar Sep 15 '21 20:09 taegeonum

@Kangji Any update?

not yet... :( It has been delayed due to the fall semester, even though i'm trying to do asap. I'll let you know.

Kangji avatar Sep 23 '21 07:09 Kangji

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Nov 01 '21 08:11 sonarqubecloud[bot]

@taegeonum Can you take a final look?

wonook avatar Nov 01 '21 08:11 wonook