HIVE-28548: Subdivide memory size allocated to parallel operators
What changes were proposed in this pull request?
Let each operator know how much it can use at maximum.
Why are the changes needed?
We observed OOM happens when SharedWorkOptimizer merges heavy operators such as GroupByOperators with a map-side hash aggregation. I guess it is hard for GroupByOperator to control its memory correctly in that case.
I confirmed Tez has a similar feature to reallocate memory when a task is connected to multiple edges. https://github.com/apache/tez/blob/rel/release-0.10.4/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/resources/WeightedScalingMemoryDistributor.java
https://issues.apache.org/jira/browse/HIVE-28548
I also found it is still problematic when # of merged operators is huge, e.g. 100, because the assignment per operator gets tiny. I will handle such case in HIVE-28549.
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?
Checked local logs.
--! qt:dataset:src
set hive.auto.convert.join=true;
SELECT *
FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY key) t1
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' GROUP BY key) t2 ON t1.key = t2.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' GROUP BY key) t3 ON t1.key = t3.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' GROUP BY key) t4 ON t1.key = t4.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' GROUP BY key) t5 ON t1.key = t5.key;
set hive.vectorized.execution.enabled=false;
SELECT *
FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY key) t1
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' GROUP BY key) t2 ON t1.key = t2.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' GROUP BY key) t3 ON t1.key = t3.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' GROUP BY key) t4 ON t1.key = t4.key
LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' GROUP BY key) t5 ON t1.key = t5.key;
The total memory reduced from 358MB to 71MB.
% cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep OperatorUtils | grep Assigning
2024-09-30T23:54:38,486 INFO [TezTR-672468_1_3_0_0_0] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators
2024-09-30T23:54:38,831 INFO [TezTR-672468_1_3_0_0_1] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators
2024-09-30T23:54:42,015 INFO [TezTR-672468_1_4_0_0_0] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators
2024-09-30T23:54:42,310 INFO [TezTR-672468_1_4_0_0_1] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators
% cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep GroupByOperator | grep 'Max hash table'
2024-09-30T23:54:36,280 INFO [TezTR-672468_1_2_0_0_0] exec.GroupByOperator: Max hash table memory: 179.20MB (358.40MB * 0.5)
2024-09-30T23:54:42,015 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5)
2024-09-30T23:54:42,017 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5)
2024-09-30T23:54:42,018 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5)
2024-09-30T23:54:42,018 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5)
...
% cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep VectorGroupByOperator | grep 'GBY memory limits'
2024-09-30T23:54:38,491 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
2024-09-30T23:54:38,499 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
2024-09-30T23:54:38,500 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
2024-09-30T23:54:38,500 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
...
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Just for your information, we tested this patch using TPC-DS queries 1-25 on a 10TB dataset. The total execution times were similar, though query 17 became slower (54s → 94s), while query 5 improved (142s → 101s). I haven't investigated the differences in detail, so they could be due to factors like stragglers or other irrelevant issues. Overall, I think the total execution times show that the patch doesn’t introduce any significant performance problems in typical cases, and the patch seems like a reasonable change to me.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
@ngsg @deniskuzZ @abstractdog Do you think this approach is reasonable? If the answer is yes, I will rebase this branch
I think this is a reasonable approach.
Regarding the isTez/isLlap issue in VectorGroupByOperator, I'm inclined to use isTez and drop isLlap, as VectorGroupByOperator checks overall heap usage via gcCanary, not MemoryMXBean#getHeapMemoryUsage and memoryThreshold. However, I'm not certain whether Tez without LLAP also respects getConf().getMaxMemoryAvailable(), so it would be nice if others could confirm this as well.
Quality Gate passed
Issues
11 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
I rebased the branch and CI is green. I'm open to any suggestions