f-hafner
f-hafner
I am having a similar issue with record linkage: the training session gives mostly distincts and only very few matches. **Problem**: In version 2.0.17, the labelling gives a lot of...
I should be able to share a sample of my dataset where the issue occurs; I'll let you know.
@tigerang22 , I think I can confirm this. With 2.0.14, I stopped at 100 negative, 1 positive. With 2.0.13, I stopped at 22 negative, 18 positive. I'll prepare the data...
Here is the repo with data and scripts: https://github.com/f-hafner/dedupe_training_example I hope it works; let me know if I need to fix something.
I haven't tried it out yet, but I will let you know when I have
Hi @fgregg , @tigerang22 I tried using the github version of dedupe (also on my sample data). It still gave almost only negatives. But I am not sure I got...
I came across this thread by chance, and thought I added my two cents from one of my previous research projects. There, we've been using sqlite extensively and over time...
I think I tried to replace the entire sqlite db with DuckDB in the past, but then abandoned the idea because I could not persist the row indices, but perhaps...
@suvayu and I will work on the following ### Issues to address for db comparisons duckdb vs sqlite - when resources are constrained, it should not fail -> set memory...
yes, I'd be happy to make a PR! As for the manual uploading: I agree to both arguments, but perhaps the manual way could be a "last resort" for debugging?...