metricflow icon indicating copy to clipboard operation
metricflow copied to clipboard

[Feature] Support custom calendar

Open Jstein77 opened this issue 1 year ago • 5 comments

Context

Users want to be able to use a custom calendar, like a fiscal calendar or retail calendar, with Metricflow’s date/time operations. For example, if my company uses fiscal quarters I should be able to request a metric by fiscal quarter.

User Stories

  1. My company uses a fiscal calendar and I want to see revenue by quarter. When I query mf query --metrics revenue --group-by metric_time__quarter . I expect to see revenue broken out by fiscal quarter.
    1. i.e 2024-02-01 - 2023-04-31 is Q1, 2024-05-01 - 2024-07-31 is Q2, 2024-08-01 = 2024-10-31 is q3, 2024-11-01 - 2025-01-31 is Q4
  2. My company uses a 4-5-4 retail calendar i.e the first month of the quarter has 4 weeks, the second has 5 weeks and the third has 4 weeks. When I query mf query --metrics revenue --group-by metric_time__quarter the 5-week second month is taken into account.
  3. I want to see metrics using either a standard calendar or a custom calendar.
  4. I want my week to start on a day other than Monday

Proposed Solution

Our proposed solution is to have users create another date spine model, similar to the metricflow_time_spine model which has the logic needed for their date transformation expressed in SQL. Alternatively, you can add columns to your current date spine model if you prefer instead of creating a new one.

Next, you will add a configuration file to metricflow that tells us the name of the dbt model that contains your custom calendar, and how the columns map to metricflow’s supported granularity. When you query this calendar we will join the date spine model to the query and use the column specified for each granularity.

The dbt_date package is a good example of what a fiscal calendar would look like, and we have an example table in our internal analytics project called all_dates. This table format is a good example of the format we expect. Screenshot 2024-02-20 at 9 50 29 PM

Calendar model requirements

There are a few requirements for the custom calendar model:

  1. It must be at a daily grain and joinable to other date columns on date_day.
  2. Columns must map to the data type we expect for a given granularity. For example, quarter must be a date.

Configuring your calendars in MetricFlow

Once you’ve created your customer calendar table, you need to tell MetricFlow where this calendar lives and map the columns to supported granularities. There are a few inputs we need from the user to use their custom calendar:

  1. The name of the custom calendar model.
  2. The columns in the table that represent custom granularities i.e fiscal quarter. This map back to internal MetricFlow granularities

This configuration in metric flow would look like the following. The full spec is outlined in the interface changes section.

semantic_layer_project_configs:
  custom_calanders:
    - name: all_dates# name of the dbt model 
       granularities:
       fiscal_quarter: fiscal_quarter_date #column to map to

Calling a custom calendar through our APIs

Once the user has configured their custom calendar the can now call this calendar at query time as a function on metrics. For example:

JDBC query:

  1. Query fiscal_calendar granularity from all_date calendar
select
	*
from 
	semantic_layer.query( metrics.calendar('all_dates')=['revenue'], group_by = ['metric_time__fiscal_quarter'])
---The above query compiles compiles to: 
select
	sum(revenue),
	fiscal_quarter as metric_time__fiscal_quarter
from orders ord
left join all_days fc on ord.ordered_at = fc.date_day
  1. Querying quarter using the all_dates calendar. This defaults to the standard behaviour because there is no value set for quarter granularity i.e date_trunc(metric_time, 'quarter')
select
	*
from 
	semantic_layer.query( metrics.calendar('all_dates')=['revenue'],  group_by = ['metric_time__quarter'])
--The above query compiles to:

select
	sum(revenue),
	date_trunc(metric_time,'quarter') as metric_time__quarter
from orders ord

CLI query:

mf query --metrics revenue --calendar all_dates --group-by metric_time__fiscal_quarter

Limitations querying metrics with a custom calendar

  • You can only choose one calendar for the entire query, not per metric

Why create a custom calendar model and join that model to metrics queries instead of doing the date transformations natively in metricflow?

Date transformations are complex, and organizations will have extremely custom needs. Expressing all the date permutations you might need to do in the spec in a generalized way becomes very difficult. The proposed approach allows you to express the date transformations you need in SQL giving you maximum flexibility. When you’ve finished defining your date transformations, you can expose them to the semantic layer, and query them in a governed way since the columns are mapped back to metricflow granularities. The potential risks called out below are missing date values and possible performance concerns. Both these risks are relatively minimal, and the tradeoff for more flexibility feels worth it.

Requirements:

There are a few requirements/updates needed in MetricFlow to support this.

  • Metricflow can take multiple time spine models
  • Update our granularity enums to include fiscal quarter and fiscal year. These granularities would require a custom calendar to be set.
  • Expose project-level configs to users so they can tell us the name of their custom calendar model and map columns to granularities
  • Update APIs to choose which calendar to use for metrics

Interface Changes

You need to do 2 things to set up a custom calendar:

  1. Create a time spine model in your dbt project that contains the date transformations you need. We can provide examples of common date transformations i.e fiscal quarter
  2. Tell Metricflow the name of that calendar and which columns map to which granularities

For #2 we have an internal representation of project-level configs where we point to the base time spine model. We can expose this to users and have them configure an additional calendar/time table.

Parameter Description
semantic_layer_project_configs Project-level configurations that apply to your entire semantic graph
semantic_layer_project_configs:calendars Configuration for a calendar model. This points to a dbt model in your project
semantic_layer_project_configs:clendars:granularities It makes it back to one of our granularity enums (currently day, week, month, quarter, year). We will extend this to include fiscal quarter and fiscal year.
semantic_layer_project_configs:calendars:name The name of the dbt model
semantic_layer_project_configs:
	calanders:
		- name: #the name of the dbt model
			granularities:
				day: #column name for daily grain in your timespine model
				week: #column name for daily grain in your timespine model
				month: #column name for daily grain in your timespine model
				year: #column name for daily grain in your timespine model
				fiscal_quarter: #column name for daily grain in your timespine model
				fiscal_year: #column name for daily grain in your timespine model

Risks:

  • There may be null dates if not every date in the data set is represented in the custom calendar
  • This is a vector for metric values to be different i.e you query a metric using a custom calendar and a regular calendar and get different values
  • There is a performance hit requesting a granularity from your custom calendar since we have to do a table scan and join to the calendar model. This should be relatively small. We could maybe do some smart filtering on the calendar table to make to only join dates that are after the minimum time in the data set.

Jstein77 avatar Oct 25 '23 03:10 Jstein77

Just to add one more use case to this feature. We also have customers requesting for different start of week (first day of week), either Monday or Sunday when working with weekly granularity queries.

gneius avatar Feb 06 '24 21:02 gneius

Adding another use case; we're very interested in this for supporting retail calendars / 454 calendars. Without this support we're completely unable to use MetricFlow for our use cases.

snikch avatar Feb 07 '24 20:02 snikch

@gneius @snikch Just updated the issue with a more detailed explanation of what we're proposing. @gneius Supporting a different week start will be possible with the proposed implementation.

@snikch can you say a bit more about how you envision retail calendars working in metricflow?

Jstein77 avatar Feb 26 '24 16:02 Jstein77

Thanks @Jstein77. I'm going to "summon" my colleague @nzben into the convo as he's been working more closely on this than I have.

snikch avatar Feb 26 '24 21:02 snikch

@Jstein77 if I'm understanding your comprehensive feature overview correctly, I think what you've outlined would work great for us. In our example we have multiple data subsidiaries who may have different retail calendars:

  • 4-4-5 quarters with a year that starts on the first Sunday in July
  • 4-5-4 quarters with a year that starts on the first week that has 4 or more days in April
  • etc

We have built a bunch of SQL functions that can generate the required date spines, and at the moment have these denormalised into fields like date_week_july_sun_4_4_5 so we can just use them as dimensions depending on the subsidiary we are processing data for.

You can only choose one calendar for the entire query, not per metric

This limitation is totally fine. We would not expect to have a query with multiple calendars. We'd more likely have the calendar per subsidiary in our configuration, and all queries for that subsidiary would apply the calendar.

Your notes in "Risks" are the real bonuses for us:

you query a metric using a custom calendar and a regular calendar and get different values

Precisely what we need. "This week vs same week last year" would be different between two subsidiaries with different week start days.

We could maybe do some smart filtering on the calendar table to make to only join dates that are after the minimum time in the data set.

We do a form of this manually at the moment using a CTE to select the date range for a particular set of fiscal periods, then use the results of that (start and end dates) to filter the subsequent queries.

nzben avatar Feb 26 '24 22:02 nzben