clkhash icon indicating copy to clipboard operation
clkhash copied to clipboard

CLK hash: hash pii for entity matching

Results 36 clkhash issues
Sort by recently updated
recently updated
newest added

Bumps [hypothesis](https://github.com/HypothesisWorks/hypothesis) from 6.43.3 to 6.56.3. Release notes Sourced from hypothesis's releases. Hypothesis for Python - version 6.56.3 This patch teaches "text()" to rewrite a few more filter predicates (issue...

dependencies

Bumps [requests](https://github.com/psf/requests) from 2.27.1 to 2.28.1. Release notes Sourced from requests's releases. v2.28.1 2.28.1 (2022-06-29) Improvements Speed optimization in iter_content with transition to yield from. (#6170) Dependencies Added support for...

dependencies

Bumps [notebook](http://jupyter.org) from 6.4.11 to 6.4.12. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=notebook&package-manager=pip&previous-version=6.4.11&new-version=6.4.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...

dependencies

@RacingTadpole has carried out an independent code review of clkhash. ### Summary > No major issues identified. The code looks well structured. The two most important issues would be clearing...

help wanted
Epic

Before computing similarity scores and matching, we normally need to check the count the encodings to see if it is consistent with blocking file. Currently we either load the whole...

enhancement

In **Encoding hierarchical classification codes for Privacy-preserving Record Linkage using Bloom filters** _Rainer Schnell_ and _Christian Borgs_ introduce encoding Hierarchical classification codes into bloom filters: > > Hierarchical classification codes...

Pypy3 runs the unit tests in ~1m 41s versus Python 3.6 taking ~23s. Example build on [azure devops](https://dev.azure.com/data61/Anonlink/_build/results?buildId=1802&view=results). This issue is to identify why it is slower, and ideally solve...

Documentation required for: - Making a JSON file with the schema. Consider a format similar to the [OpenAPI documentation](https://swagger.io/specification/). - Making schema within code. Aha! Link: https://csiro.aha.io/features/ANONLINK-47

Currently we have a mixed terminology to refer to _features / fields_. The schema uses _features_, whereas the `field_formats.py` file mostly uses _fields_. It would be nice to unify this....

Similar to #299. Here I propose to implement the results of the work on distance aware address comparisons. comparing address strings after an address change leads to arbitrary similarities. An...