[Hubs] Support for large datasets
đ Scenario
As a FinOps practitioner, I need to ingest data into a queryable data store in order to report on data at scale beyond $5M/mo
đ Solution
Support large datasets (e.g., 500 GB/mo) with up to 7 years of historical data that refreshes when changed by adding an option to ingest data into Azure Data Explorer and update reporting to leverage that database.
đ Tasks
### Required tasks
- [x] Decide on data store: SQL, ADX, Synapse
- [ ] #300
- [ ] #301
- [ ] #376
- [ ] Update to ingest FOCUS 1.0
- [ ] De-duplicate data that gets re-exported by Cost Management
- [x] Confirm the ADX SKU
- [ ] Update Power BI reports
- [ ] #670
- [ ] #671
- [ ] Update CreateUiDefinition.json
- [ ] Create pipeline to start/shutdown the ADX cluster based on settings.json config
- [ ] Auto-start ADX before and shutdown after ingestion
- [ ] Run CM exports on a custom schedule
### Stretch goals
- [ ] Backfill all data in storage during setup
- [ ] Implement retention policies for parquet data
- [ ] Should we archive parquet data after ingestion?
- [ ] #377
- [ ] #667
- [ ] #668
âšī¸ Additional context
There was an internal analysis of the optimal data store to use for the largest datasets and Azure Data Explorer was deemed to be the best option that balanced cost, performance, and scale.
đââī¸ Ask for the community
We could use your help:
- Please vote this issue up (đ) to prioritize it.
- Leave comments to help us solidify the vision.
Closing this since we're tracking releases in a new way now and this is outdated.
Hello @flanakin, is this feature still in your backlog? We would we highly interested in being able to handle also larger datasets more easily.
@t-esslinger Sorry for missing the comment. Yes, this is still in the backlog. We're making progress slowly. I'm reopening this issue to track everything needed.