clkhash icon indicating copy to clipboard operation
clkhash copied to clipboard

CLK hash: hash pii for entity matching

Results 36 clkhash issues
Sort by recently updated
recently updated
newest added

There's no reason not to allow a field to be processed more than once with different tokenization and hashing. V2 schema can represent this, but current code can't handle it....

enhancement

It would be a good idea to make it clear how use the library without serialization e.g. to directly use the clkhash output with anonlink. There are a few functions...

Epic

Say a row doesn't have data for one field: ``` INDEX,NAME freetext,DOB YYYY/MM/DD,GENDER M or F 0,Libby Slemmer,1933/09/13,F 1,Garold Staten,,M 2,Yaritza Edman,1972/11/30, ``` What should we do? 1) Current approach...

Aha! Link: https://csiro.aha.io/features/ANONLINK-50

We need to add a note on security... The Cryptographic Longterm Key is computed and compared following the method described by Rainer Schnell, Tobias Bachteler, and Jörg Reiher in [A...

security
P2: required

Currently, the defaults are embedded in the code. This is in addition to them being listed in the master schema. This can lead to inconsistencies if the defaults are changed...

enhancement

In literature, the length of a CLK _l_ is either fixed to 1000 or 100. Depending on who is writing the paper. I read somewhere (unfortunately I cannot find it...

research

> Assuming clk.py is meant to be common code that could support a number of different interfaces, then tqdm’s progress bars (which are specific to a CLI) should be handled...

P5: low

The readme or docs should state how we configure and run mypy. Aha! Link: https://csiro.aha.io/features/ANONLINK-37

help wanted

the error you will get looks like this: ``` --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) in () 1 from clkhash import clk ----> 2 hashed_data_a = clk.generate_clk_from_csv(a_csv, ('key1', 'key2'),...

bug