clkhash icon indicating copy to clipboard operation
clkhash copied to clipboard

CLK hash: hash pii for entity matching

Results 36 clkhash issues
Sort by recently updated
recently updated
newest added

I propose removing the dependency on Bitarray and using bitwise operations on `int`s instead. I see no good reason to use `bitarray`. The only two operations we use on it...

enhancement

Ideas: - version of clkhash used - size and statistics of clks - schema (or hash of schema) - hash of clks - timestamp (when the PII was encoded)

While reading [Options for encoding names for data linking at the Australian Bureau of Statistics](https://arxiv.org/abs/1802.07975) I came across this note regarding restrictions on the bloom filter's modulus: ![screenshot from 2018-02-24...

help wanted
question
research

Consider if the right levels of abstraction have been made for a library user and document options to improve. It should be relatively easy for a clkhash user to define...

P3: important

An experimental api has been added for uploading CLKs as a binary file. This is to allow for faster and more efficient data transfer. The same rest endpoint (`/projects/{project_id}/clks`) is...

enhancement
P5: low

Add a page to the docs with information about supported platforms including any special instructions on how to install dependencies e.g. Visual Studio C++ compiler on Windows. Perhaps worth looking...

help wanted

We should rethink defaults as currently: * `clkhash` ignores the values in the spec * the defaults are spread throughout the code base. Either hard-coded (e.g. schema.py line 184), default...

Consider applying [black](https://github.com/ambv/black) Aha! Link: https://csiro.aha.io/features/ANONLINK-39

Users trying to use clkhash have ran into issues with `head` and with the multiline commands separated with `/`