finops-toolkit icon indicating copy to clipboard operation
finops-toolkit copied to clipboard

Prices_transform_v1_0 function does not work as intended

Open RolandKrummenacher opened this issue 7 months ago â€ĸ 3 comments

🐛 Problem

The Prices_transform_v1_0 function transforms Prices_raw into Prices_final_v1_0. However, the function references the entire Prices_raw table multiple times. If the pricesheet is ingested as multiple parquet-snappy files, the transformation function appears to be executed separately per file. As a result, it never has access to the full price table during transformation.

đŸ‘Ŗ Repro steps

  1. Create a modified version of Prices_transform_v1_0 with the following content:
    .create-or-alter function
      with (docstring='Transforms Prices_raw into FOCUS 1.0.', folder='Prices')
      Prices_transform_v1_0() {
        let prices = materialize(
          Prices_raw
        );
        prices
        | summarize count()
      }
    
    
  2. Ingest a single pricesheet consisting of two parquet-snappy files.
  3. Observe the resulting records in Prices_final_v1_0.

🤔 Expected

If the transformation would have access to the full dataset (i.e., the union of all rows across the multiple files) we should see a single summarized result based on the total record count. This is not the case.

Therefore, the original transformation function does not work as intended. The transformation function can not reference the whole source table, for example to get a complete list of meterIds for SP or RIs

In particular the following statements don't work:

let riMeters = prices | where x_SkuPriceType == 'ReservedInstance' | distinct x_SkuMeterId;
let spMeters = prices | where x_SkuPriceType == 'SavingsPlan' | distinct x_SkuMeterId;

prices
| where x_SkuPriceType == 'SavingsPlan'
| lookup kind=leftouter (prices | where x_SkuPriceType == 'Consumption'...

📷 Screenshots

Image

â„šī¸ Additional context

The screenshot demonstrates that the function was executed separately on each of the two files, producing two count outputs: 866,482 and 115,098. These two results align exactly with the expected row count of the individual files, but not the full dataset total of 981,580.

đŸ™‹â€â™€ī¸ Ask for the community

We could use your help:

  1. Please vote this issue up (👍) to prioritize it.
  2. Leave comments to help us solidify the vision.

RolandKrummenacher avatar May 22 '25 09:05 RolandKrummenacher

The Prices_raw during the execution of the update policy function is only the subset of data that was currently ingested in the raw table.

RolandKrummenacher avatar May 22 '25 09:05 RolandKrummenacher

This makes sense. Let's brainstorm options after FinOps X. A few ideas off the top of my head:

  1. Change ingestion to happen across all files at once (will have scale limitations)
  2. Run a .update command after all export files are processed to set eligibility
  3. Create a SKUs open data file that contains all SKUs/meters with detailed metadata and commitment discount eligibility, then join with that during Prices_raw ingestion

flanakin avatar Jun 01 '25 05:06 flanakin

The eligibility columns could also be projected on the fly as part of the "Prices" view (calculated from the x_SkuPriceType columns). But having an open data file for all meterIds would be my preferred choice.

RolandKrummenacher avatar Jun 02 '25 07:06 RolandKrummenacher