cube icon indicating copy to clipboard operation
cube copied to clipboard

Bigquery EXTERNAL_QUERY filters: support unsafeValue for FILTER_PARAMS

Open System-Glitch opened this issue 4 years ago • 11 comments

Problem

First of all, thank you for making Cube.js, it is a powerful and useful tool!

We are using Google BigQuery, PostgreSQL on Google Cloud SQL and Cube.js. We have a pre-aggregated BigQuery table that Cube will query on. This table is however missing some information that we are getting from a PostgreSQL table using a federated query.

To be more precise, in our case we have a website name in the BigQuery table, and the country in which this website is hosted in the PostgreSQL table, so we join to get both information in the same cube. All sites are associated with an organization (organization_id column). We are always filtering by organization.

To avoid querying the entire PostgreSQL table and to take advantage of indexes, we would like to apply the cube filters to the external query as well, but there are some problems with this approach:

  • Using FILTER_PARAMS, Cube generates a positional parameter, but the EXTERNAL_QUERY requires either a positional parameter that represents the entire external query or a string literal.
  • There is no way (as far as I know) to get the real filter value from FILTER_PARAMS so the generated SQL already contains the filter value instead of being parameterized.
  • Using SECURITY_CONTEXT and unsafeValue() does not look like a solution in our case because we would have to generate a new JWT for each Cube request, as the organization_id can vary, or there can even be many organization_ids.

Related Cube.js schema

cube(`BigQueryTable`, {
  sql: `
    SELECT *, _PARTITIONTIME FROM database.bigquery_table
    LEFT OUTER JOIN
      EXTERNAL_QUERY('connection_id',
        """SELECT name, countryIso AS country FROM sites WHERE ${FILTER_PARAMS.BigQueryTable.organizationId.filter('organization_id')}""") AS rq
      ON
        rq.name = bigquery_table.site
    `,

  measures: {
    //...
  },

  dimensions: {
    organizationId: {
      sql: `organization_id`,
      type: `string`
    },

    site: {
      sql: `site`,
      type: `string`
    },

    country: {
      sql: `country`,
      type: `string`
    },

    //...

    partitionTime: {
      sql: `_PARTITIONTIME`,
      type: `time`
    }
  },

  dataSource: `default`
});

Related Cube.js generated SQL

SELECT
  /* ... */
FROM (
  SELECT
    *,
    _PARTITIONTIME
  FROM
    database.bigquery_table
  LEFT OUTER JOIN
    EXTERNAL_QUERY('connection_id',
      """SELECT name, countryIso AS country from sites WHERE organization_id = ?""") AS rq
  ON
    rq.name = bigquery_table.site ) AS `bigquery_table`
WHERE
  /* ... */
  AND (`bigquery_table`._PARTITIONTIME >= TIMESTAMP(?)
    AND `bigquery_table`._PARTITIONTIME <= TIMESTAMP(?))
  AND (`bigquery_table`.organization_id = ?)
GROUP BY
  1,
  2,
  3
ORDER BY
  2 ASC
LIMIT
  50000

What I need:

EXTERNAL_QUERY('connection_id', ?) AS rq

With a param value as : """SELECT name, countryIso AS country from sites WHERE organization_id = 1234"""

Or the same query that is already generated, but without parameterizing the organization_id in the external query.

Do you think it is possible?

System-Glitch avatar Oct 26 '21 08:10 System-Glitch

Hey @System-Glitch ! You can pass a function to FILTER_PARAMS. Please see https://cube.dev/docs/schema/reference/cube#filter-params.

paveltiunov avatar Nov 02 '21 04:11 paveltiunov

Hello @paveltiunov and thank you for your response. I tried with a function as well but it doesn't work neither unfortunately. The function receives "$0$" as a value (not the actual value) and by digging in Cube's code I found out that it is still converted to "?" afterwards. There is no way I can get the real filter value from the function, as far as I know.

System-Glitch avatar Nov 02 '21 08:11 System-Glitch

@System-Glitch I guess you can't but you can CONCAT filter place holder on a BigQuery side with other parts of your query string in order to get a final one.

paveltiunov avatar Nov 03 '21 04:11 paveltiunov

Bigquery needs that value to be a string literal or a query parameter. I cannot use CONCAT on bigquery's side. My only option is to include the real filter value during SQL generation.

Invalid table-valued function EXTERNAL_QUERY Connection argument in EXTERNAL_QUERY must be a literal string or query parameter at [16:5]

It looks like there is no solution to this problem yet, so I will leave this issue open for now.

System-Glitch avatar Nov 03 '21 09:11 System-Glitch

@System-Glitch COMPILE_CONTEXT can be a workaround here in this case.

paveltiunov avatar Mar 01 '22 05:03 paveltiunov

@paveltiunov why was this closed? Is there a solution or example you can provide that works?

rccoe avatar Jul 05 '22 12:07 rccoe

@rccoe Yep. You can use COMPILE_CONTEXT as a workaround. But let's reopen and track unsafeValue for FILTER_PARAMS here

paveltiunov avatar Jul 05 '22 20:07 paveltiunov

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

github-actions[bot] avatar Jul 05 '22 20:07 github-actions[bot]

Can you provide an example of how COMPILE_CONTEXT can be used?

Another thing to remember is that BQ driver is going to be generating the SQL, where the inner query might need to be MySQL or PostgreSQL (or other), so generated inner code might break if you can't adjust the driver

rccoe avatar Jul 05 '22 20:07 rccoe

@rccoe Typical usage would be

const {
    securityContext: { organizationId },
  } = COMPILE_CONTEXT;
  
cube(`BigQueryTable`, {
  sql: `
    SELECT *, _PARTITIONTIME FROM database.bigquery_table
    LEFT OUTER JOIN
      EXTERNAL_QUERY('connection_id',
        """SELECT name, countryIso AS country FROM sites WHERE organization_id = ${organizationId}""") AS rq
      ON
        rq.name = bigquery_table.site
    `,
  // ...
});

paveltiunov avatar Jul 08 '22 18:07 paveltiunov

Got it, Thanks @paveltiunov . To clarify, that would mean you would need a context per-organization, right? One couldn't put other filters in there ever.

rccoe avatar Jul 08 '22 18:07 rccoe

How would you pass organizationId to this cube from the JSON query object?

sufiyangorgias avatar Feb 23 '24 14:02 sufiyangorgias

Hi @sufiyangorgias, please see Pavel's code example above. The first three lines show how to use the organizationId value from the security context. Here are relevant docs as well: https://cube.dev/docs/product/auth/context

igorlukanin avatar Feb 26 '24 10:02 igorlukanin