pg_diffix
pg_diffix copied to clipboard
Implementation of the Open Diffix anonymization mechanism for PostgreSQL.
Having `cast(extract(minute from ts) as integer))` supported, there are the following expressions which we can't bucket by at the moment (coming from from Metabase): ``` day: CAST(last_seen AS date) day...
The spot taken by the *-bucket row in the results of a query can vary from query to query (and sometimes for the same query issued before or after `ANALYZE;`!)....
The current support for AIDs is a bit limited: only `integer`, `bigint` and `text/varchar` types are allowed. We should support `uuid`, `char` and `numeric` as well.
Why do we have a list of pointers here: https://github.com/diffix/pg_diffix/blob/f605204371533719063552aed5e4d7b1739f720d/src/aggregation/low_count.c#L44 Why not store `AidTrackerState`s by value, avoiding the extra pallocs?
We are doing allocation of structs with flexible array members wrong. It should be: ```c palloc( offsetof(MyStruct, last_member) + num_items * sizeof(ArrayMember) ); ``` See https://github.com/postgres/postgres/blob/master/src/include/c.h#L342-L350. The reason is that...
Also `compute_bucket_seed` is out of place in `anonymization.c`. It should be moved to `common.c`.
We hash in a few places (aid, count distinct, bucket seed, ...?). Variable length data can potentially be compressed, and instead of hashing the actual value we hash the compressed...
Maybe a custom extension node to implement the association would work?
Related to discussions [in the PR here](https://github.com/diffix/pg_diffix/pull/139#discussion_r778289305) and [in Slack](https://opendiffix.slack.com/archives/C01GA877TQS/p1641375372052700). In the code, it is often not clear if `aid` means `aid instance` (so a column or expression) or a...