ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(bigquery): support for `WITH AGGREGATION_THRESHOLD` in aggregations

Open tswast opened this issue 1 year ago • 2 comments

Is your feature request related to a problem?

BigQuery customers can set aggregation threshold analysis rules to protect privacy-sensitive data. If they have setup such rules then they need to use a WITH AGGREGATION_THRESHOLD clause when querying the table.

SELECT WITH AGGREGATION_THRESHOLD
  test_id, COUNT(DISTINCT last_name) AS student_count
FROM mydataset.ExamView
GROUP BY test_id;

from https://cloud.google.com/bigquery/docs/analysis-rules#view_in_privacy_query

Describe the solution you'd like

A new parameter to Table.aggregate and/or Table.groupby would seem to be the right place to add this.

Alternatively, maybe a new pre-groupby table expression type for a thresholded table.

What version of ibis are you running?

N/A

What backend(s) are you using, if any?

BigQuery

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

tswast avatar Apr 05 '24 20:04 tswast

Can we think of it as an arbitrary query setting similar for example to what clickhouse has?

kszucs avatar Apr 09 '24 09:04 kszucs

Can we think of it as an arbitrary query setting similar for example to what clickhouse has?

I haven't used clickhouse, but it looks pretty similar. Clickhouse looks like it supports general key/values, but there's an extra layer of syntax in BigQuery, with each feature enablement having its own sub-options.

There is a related (sub)query-scoped option specifically for privacy options via SELECT [ WITH differential_privacy_clause ], which is documented as part of the general SELECT syntax.

I don't actually see AGGREGATION_THRESHOLD listed there, but from the examples, the AGGREGATION_THRESHOLD clause looks like it'd be parsed and scoped to the (sub)query in the same way.

tswast avatar Apr 09 '24 16:04 tswast