HIVE-28165 HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload
NOTE: this patch won't compile until TEZ-4548 is not released in Tez 0.10.4, because it uses new API InputDataInformationEvent.createWithSerializedPath(...) and InputInitializerContext.getVertexId(). I ran a full recommit test against a downstream tez containing this change, that's how I got qtest and unit test results.
What changes were proposed in this pull request?
Serialize splits to filesystem according to their size.
Why are the changes needed?
To avoid holding too much split payload in memory, as they're are collected in a loop and returned as a list.
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?
Cluster testing:
- logs sanity-checked
- OOM issue was resolved Unit testing:
- hive unit tests
- tez unit tests
As you mentioned: "NOTE: this patch won't compile until TEZ-4548 is not released in Tez 0.10.4", I guess a pom.xml change would be necessary as well, but I don't see in the list of modified files.
As you mentioned: "NOTE: this patch won't compile until TEZ-4548 is not released in Tez 0.10.4", I guess a pom.xml change would be necessary as well, but I don't see in the list of modified files.
Tez 0.10.4 has not yet been released, not even planned yet, so this PR is blocked until that so I could only expect a formal code review here, without providing the test results (what I did downstream btw)
I had only minor questions, on overall, LGTM.
thanks a lot, I addressed what was possible in https://github.com/apache/hive/pull/5174/commits/c2602866c753f0842a2d90bb7b48f237faeea188 and replied about the rest, please let me know if you can see anything else
@deniskuzZ : addressed all comments in https://github.com/apache/hive/pull/5174/commits/5b53236075f53fae497627b8e5f778df83b83ea6 unit tests added for the failure scenarios
@abstractdog, are we still waiting for the Tez changes to land?
@abstractdog, are we still waiting for the Tez changes to land?
yes, unfortunately, we cannot merge this until tez 0.10.4 is released, that's why I set it blocked by: https://issues.apache.org/jira/browse/TEZ-4556
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
reopened as https://github.com/apache/hive/pull/5449