Nick Crews comments

Results 281 comments of


                                            Nick Crews

bug: Date.to_pandas() errors, having trouble repro-ing

@gforsyth ahh, thanks for the explanation of how those prerelease numbers work! Now in the future I can find the exact SHA myself. PS, would it be possible to include...

Parser training sets link is down

The YYYYMMDD and FILE in the given URL are placeholders for a date and extension. You have to replace those with literal values.

Add a section on dependency management within Splink

This looks like a great thing to think about and write down. didn't read in depth, but I did notice a type of the filename `depdenency_management.md`

Tip: Use docker images for pyspark and postgres for testing

This lib looks like it makes this trivial: https://testcontainers-python.readthedocs.io/en/latest/README.html

[FEAT] viz blocking rules using upset chart

lol, I THOUGHT I already found that upset chart, but I found it again and was blown away a second time :) Sounds good, no rush at all, I didn't...

[FEAT] viz blocking rules using upset chart

FYI I have a basic implementation of [this here](https://github.com/NickCrews/mismo/blob/0e233215659b40e6be4baaef5d06f4766ee8d1e2/mismo/block/_upset.py), you can see what this looks like in [this walkthrough](https://nickcrews.github.io/mismo/examples/patent_deduplication/). I would like to in the future refactor the upset plot...

Nick Crews

bug: Date.to_pandas() errors, having trouble repro-ing

Parser training sets link is down

Add a section on dependency management within Splink

Tip: Use docker images for pyspark and postgres for testing

[FEAT] viz blocking rules using upset chart

[FEAT] viz blocking rules using upset chart

bug: "can't take logarithm of zero"

[FEAT] support "definite_[non]match_{comparison,blocking}_rules" (and "exact match pre-deduping")

[FEAT] convert blocking rules to DNF

FYI: paper using combo of embeddings and comparisons