OpenMetadata
OpenMetadata copied to clipboard
[WIP] Fixes <issue to open>: optimizing table profiler process for BQ
Describe your changes:
Fixes
Context: During the profile ingestion process, a separate query is executed for each table in the BigQuery project to retrieve metrics like rowCount, sizeInBytes, and columnCount. This approach leads to significant performance issues due to constraints like concurrent job limits and available slots in BigQuery.
Current Behavior: The profile ingestion process retrieves table-level metrics by executing a query for each table individually. While the queries already leverage the TABLES table, this method is inefficient for projects with many tables.
Proposed Solution: We propose modifying the profile ingestion process to execute a single query on the TABLES table that retrieves the required metrics for all tables in a project or schema. This would involve caching the results of this query and reading the data from the cache during the subsequent table iteration, rather than fetching metrics for each table individually.
Type of change:
- [ ] Bug fix
- [X] Improvement
- [ ] New feature
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation
Checklist:
- [x] I have read the CONTRIBUTING document.
- [ ] My PR title is
Fixes <issue-number>: <short explanation> - [ ] I have commented on my code, particularly in hard-to-understand areas.
- [ ] For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
Hi there 👋 Thanks for your contribution!
The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.
Let us know if you need any help!
Is this stale?
Is this stale?
Looks like so. I'll be closing the PR. @AntoineGlacet feel free to pick it up