pg_diffix icon indicating copy to clipboard operation
pg_diffix copied to clipboard

Implementation of the Open Diffix anonymization mechanism for PostgreSQL.

Results 11 pg_diffix issues
Sort by recently updated
recently updated
newest added

Having `cast(extract(minute from ts) as integer))` supported, there are the following expressions which we can't bucket by at the moment (coming from from Metabase): ``` day: CAST(last_seen AS date) day...

low priority

The spot taken by the *-bucket row in the results of a query can vary from query to query (and sometimes for the same query issued before or after `ANALYZE;`!)....

The current support for AIDs is a bit limited: only `integer`, `bigint` and `text/varchar` types are allowed. We should support `uuid`, `char` and `numeric` as well.

enhancement
low priority
elm

Why do we have a list of pointers here: https://github.com/diffix/pg_diffix/blob/f605204371533719063552aed5e4d7b1739f720d/src/aggregation/low_count.c#L44 Why not store `AidTrackerState`s by value, avoiding the extra pallocs?

We are doing allocation of structs with flexible array members wrong. It should be: ```c palloc( offsetof(MyStruct, last_member) + num_items * sizeof(ArrayMember) ); ``` See https://github.com/postgres/postgres/blob/master/src/include/c.h#L342-L350. The reason is that...

Also `compute_bucket_seed` is out of place in `anonymization.c`. It should be moved to `common.c`.

We hash in a few places (aid, count distinct, bucket seed, ...?). Variable length data can potentially be compressed, and instead of hashing the actual value we hash the compressed...

Related to discussions [in the PR here](https://github.com/diffix/pg_diffix/pull/139#discussion_r778289305) and [in Slack](https://opendiffix.slack.com/archives/C01GA877TQS/p1641375372052700). In the code, it is often not clear if `aid` means `aid instance` (so a column or expression) or a...

low priority