datahub-gma
datahub-gma copied to clipboard
Update EbeanLocalAccess batchGetUnion SQL query from "UNION ALL" to "IN"
Summary
Based on previous TMS issues and slow query analysis, we identified the UNION ALL type of queries are taking a long time, exhausts DB resources, and can cause slow response times. One sample query is:
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.RtfActivityEvent,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL
UNION ALL
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.rtf-store-war-service_call,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL
UNION ALL
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.redliner-log_event,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL
UNION ALL …
This PR is to:
- Convert those long
UNION ALLlogic into usingIN, such as:
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby
FROM metadata_entity_dataset
WHERE JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL
AND urn IN (
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.RtfActivityEvent,CORP)',
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.rtf-store-war-service_call,CORP)',
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.redliner-log_event,CORP)',
...
)
- Revert the previous changes in PR https://github.com/linkedin/datahub-gma/pull/367 which was to divide UNION ALL queries into small batches. When the change was released, it caused downtime instead because all smaller queries were executed at the same time and caused memory exhaustion. After the issue, the TMS side code was reverted, but the datahub-gma changes wasn't.
Testing Done
Details testings are documented into this document. I'm briefly listing the testing and results here. Please refer to the doc for some analysis and detailed test steps.
- Directly run UNION ALL and IN queries on MySQL DB (metagalaxy_stg) to compare execution time. Result
| Test | Time |
|---|---|
| UNION ALL over 5k urns | 2.630s |
| IN over 5k urns | 0.700s |
| UNION ALL over 10k urns | 6.561s |
| IN over 10k urns | 0.880s |
- Local Testing
- EI Testing
- DC Testing
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable)
Codecov Report
Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
Project coverage is 67.75%. Comparing base (
c79500a) to head (005e5f2). Report is 70 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...linkedin/metadata/dao/utils/SQLStatementUtils.java | 33.33% | 1 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #386 +/- ##
============================================
+ Coverage 67.74% 67.75% +0.01%
- Complexity 1343 1374 +31
============================================
Files 132 136 +4
Lines 5276 5409 +133
Branches 546 563 +17
============================================
+ Hits 3574 3665 +91
- Misses 1484 1515 +31
- Partials 218 229 +11
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.