cube icon indicating copy to clipboard operation
cube copied to clipboard

Bug with subquery and JOIN operation within filters

Open itestyoy opened this issue 1 year ago • 8 comments

A cube crashes when we use field_2 as a filter but works well when we use field_2 as dimensions.

Steps to reproduce the behavior:

  1. Run a query with filters:
{
  "filters": [
    {
      "member": "model_b.field_2",
      "operator": "set"
    }
  ],
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

and then receive an error:

RangeMaximum call stack size exceeded
  1. Run without filters:
{
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

Minimally reproducible Cube Schema:

asyncModule(async () => {

        cube('model_a', {
            sql: `select * from table`,

            dimensions: {
                field_1: {
                    sql: `field_1`,
                    type: 'string'
                },
            },

        });

        cube('model_b', {
            sql: `select * from table`,

            dimensions: {

                cube_unique_id: {
                    sql: `row_id`,
                    type: 'string',
                    primary_key: true
                },

                field_1: {
                    sql: `sum(${metric_1}) over ()`,
                    type: `number`,
                    sub_query: true,
                    propagate_filters_to_sub_query: true,
                },


                field_2: {
                    sql: `${model_a.field_1}`,
                    type: 'string'
                },

            },

            measures: {
                metric_1: {
                    sql: `metric_1`,
                    type: 'max'
                }
            },

            joins: {
                model_a: {
                    relationship: 'one_to_one',
                    sql: `${CUBE.field_1} = ${model_a.field_1}`
                },

            },

        });
})

Version: 0.35.1

itestyoy avatar Apr 04 '24 19:04 itestyoy

Hi @itestyoy 👋

Well, this is a pretty interesting data model!

  1. If you comment out or remove sub_query: true and propagate_filters_to_sub_query: true, then the RangeMaximum call stack size exceeded error will be gone. I feel like this should not be a subquery dimension at all because you're referencing the measure from the same cube (model_b.metric_1) rather than from another cube there.
  2. model_b.field_1 should probably not be a dimension at all, I can see that it's defined as sum(${metric_1}) over () and that looks like an aggregation to me. Aggregations should go to measures, not dimensions.

I hope this will help you fix this data model.

igorlukanin avatar Apr 08 '24 11:04 igorlukanin

Hi @igorlukanin !

Thank you for the explanation, but

It is not a real model, it's just a simple example demonstrating how to encounter the error without involving all models. If we use metrics from another cube in a subquery, the error will be the same. In our case, we're not using SUM, we're using FIRST_VALUE with a PARTITION BY clause.

My question is not about the model, it's about why it works well with dimensions but throws an error with filters.

itestyoy avatar Apr 08 '24 11:04 itestyoy

Even if this data model is simplified for this example, my point is that it doesn't look sound to me. This is not how subquery dimensions are supposed to be defined:

                field_1: {
                    sql: `sum(${metric_1}) over ()`,
                    type: `number`,
                    sub_query: true,
                    propagate_filters_to_sub_query: true,
                },

If you rework that part, the error will be gone.

Let me also tag @paveltiunov in case he'd have an alternative take on whether this is a sound data model or not.

igorlukanin avatar Apr 09 '24 10:04 igorlukanin

@igorlukanin @paveltiunov Hi!

If you rework that part, the error will be gone.

We reworked this part with an additional cube and we achieve propagate_filters_to_sub_query using ${FILTER_GROUP(FILTER_PARAMS....)}, but it's not a good approach because we need to synchronize FILTER_PARAMS with dimensions.

For me, regardless of whether it looks like a model or not, it seems like a bug. Maybe there are any updates?

itestyoy avatar May 03 '24 07:05 itestyoy

@itestyoy The data model that you've initially provided is not correct and it's not supposed to work. You can't have an aggregation in the sql property of a subquery dimension. That is why I don't think there's any bug in Cube here.

igorlukanin avatar May 13 '24 12:05 igorlukanin

@igorlukanin But why it works good as dimension? What is the difference between

  • select a from b - work
  • select a from b where a > 1 - not work

?

itestyoy avatar May 13 '24 12:05 itestyoy

Sorry, I don't understand.

How are select a from b and select a from b where a > 1 related anything that we discussed above?

igorlukanin avatar May 14 '24 09:05 igorlukanin

@igorlukanin

This doesn't work

(1)

{
  "filters": [
    {
      "member": "model_b.field_2",
      "operator": "set"
    }
  ],
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

This works (2)

{
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

The only difference is filters with the same field as dimension. If using your explanation, query 2 also will not work, but it does.

itestyoy avatar May 14 '24 09:05 itestyoy