Brian Thorne issues

Results 125 issues of


                                            Brian Thorne

Flaky test: test_bytes_bitarray_agree

Hypothesis thinks it has found a flaky test: ``` =================================== FAILURES =================================== ______________ test_bytes_bitarray_agree[dice_coefficient_python] ______________ sim_fun = @given(strategies.data(), strategies.floats(min_value=0, max_value=1)) > @pytest.mark.parametrize('sim_fun', SIM_FUNS) def test_bytes_bitarray_agree(sim_fun, data, threshold): /project/tests/test_similarity_dice.py:289: _ _...

state: Need more information

P5: low

GPU integration

Optional interface with cuda. Note we have a proof of concept for computing the DICE-Sorensen index, sorting and applying a threshold all on the GPU. Need to consider whether to...

enhancement

Tests that currently trick greedy algorithm

Our greedy algorithm currently fails matching the following graph, where the connection between a and 1 looks likely, but ultimately shouldn't be chosen. ![4048684e-3882-11e6-9a81-105da6c927bd](https://user-images.githubusercontent.com/855189/36401446-a28157e0-162b-11e8-8c09-2217f51ec030.png) The network methods should succeed, and...

Support multiple bloom filters

It may make sense to calculate multiple CLKs using different field sets for improved matching, blocking, matching with orgs who only have a subset of the fields, and most importantly...

proposal

Supervised method to automatically learn the threshold

Consider adding a `train` function that would be provided with training data - CLKs that are known to match. The output would be an optimal threshold `t`. This would be...

enhancement

Benchmark/test with perturbed data

Yangfeng suggested looking at [febrl](https://github.com/fgregg/febrl) to generate data with pertubations. Manual - http://users.cecs.anu.edu.au/~Peter.Christen/Febrl/febrl-0.3/febrldoc-0.3/ Additional test sets: - http://www.record-linkage.de/-Resources--other_record_linkage_resources.htm#recordlinkagetestdata - https://espace.curtin.edu.au/handle/20.500.11937/26908 Aha! Link: https://csiro.aha.io/features/ANONLINK-76

security

Compile an unsafe version using mock paillier context

For testing purposes it would be useful to have a compatible jar built of this library that uses javallier mock context.

enhancement

Brian Thorne

Flaky test: test_bytes_bitarray_agree

GPU integration

Tests that currently trick greedy algorithm

Support multiple bloom filters

Supervised method to automatically learn the threshold

Benchmark/test with perturbed data

CI should be using valgrind

Remove dependency on clkhash

Investigate effect of not including start/end bigrams

Compile an unsafe version using mock paillier context