Prices_transform_v1_0 function does not work as intended
đ Problem
The Prices_transform_v1_0 function transforms Prices_raw into Prices_final_v1_0. However, the function references the entire Prices_raw table multiple times. If the pricesheet is ingested as multiple parquet-snappy files, the transformation function appears to be executed separately per file. As a result, it never has access to the full price table during transformation.
đŖ Repro steps
- Create a modified version of
Prices_transform_v1_0with the following content:.create-or-alter function with (docstring='Transforms Prices_raw into FOCUS 1.0.', folder='Prices') Prices_transform_v1_0() { let prices = materialize( Prices_raw ); prices | summarize count() } - Ingest a single pricesheet consisting of two
parquet-snappyfiles. - Observe the resulting records in
Prices_final_v1_0.
đ¤ Expected
If the transformation would have access to the full dataset (i.e., the union of all rows across the multiple files) we should see a single summarized result based on the total record count. This is not the case.
Therefore, the original transformation function does not work as intended. The transformation function can not reference the whole source table, for example to get a complete list of meterIds for SP or RIs
In particular the following statements don't work:
let riMeters = prices | where x_SkuPriceType == 'ReservedInstance' | distinct x_SkuMeterId;
let spMeters = prices | where x_SkuPriceType == 'SavingsPlan' | distinct x_SkuMeterId;
prices
| where x_SkuPriceType == 'SavingsPlan'
| lookup kind=leftouter (prices | where x_SkuPriceType == 'Consumption'...
đˇ Screenshots
âšī¸ Additional context
The screenshot demonstrates that the function was executed separately on each of the two files, producing two count outputs: 866,482 and 115,098. These two results align exactly with the expected row count of the individual files, but not the full dataset total of 981,580.
đââī¸ Ask for the community
We could use your help:
- Please vote this issue up (đ) to prioritize it.
- Leave comments to help us solidify the vision.
The Prices_raw during the execution of the update policy function is only the subset of data that was currently ingested in the raw table.
This makes sense. Let's brainstorm options after FinOps X. A few ideas off the top of my head:
- Change ingestion to happen across all files at once (will have scale limitations)
- Run a
.updatecommand after all export files are processed to set eligibility - Create a SKUs open data file that contains all SKUs/meters with detailed metadata and commitment discount eligibility, then join with that during Prices_raw ingestion
The eligibility columns could also be projected on the fly as part of the "Prices" view (calculated from the x_SkuPriceType columns). But having an open data file for all meterIds would be my preferred choice.