Nick Crews
Nick Crews
@gforsyth ahh, thanks for the explanation of how those prerelease numbers work! Now in the future I can find the exact SHA myself. PS, would it be possible to include...
The YYYYMMDD and FILE in the given URL are placeholders for a date and extension. You have to replace those with literal values.
This looks like a great thing to think about and write down. didn't read in depth, but I did notice a type of the filename `depdenency_management.md`
This lib looks like it makes this trivial: https://testcontainers-python.readthedocs.io/en/latest/README.html
lol, I THOUGHT I already found that upset chart, but I found it again and was blown away a second time :) Sounds good, no rush at all, I didn't...
FYI I have a basic implementation of [this here](https://github.com/NickCrews/mismo/blob/0e233215659b40e6be4baaef5d06f4766ee8d1e2/mismo/block/_upset.py), you can see what this looks like in [this walkthrough](https://nickcrews.github.io/mismo/examples/patent_deduplication/). I would like to in the future refactor the upset plot...
@RobinL Im just re-reading your original response, and yes I think we should switch to combining match weights additively, otherwise Im pretty sure we will run into floating point errors....
Thanks for the thoughts @aalexandersson ! Unfortunately, I don't think varying the threshold accomplishes the same thing. There might be a comparison vector that gets a really good score that...
yeah that is a good thing to think about. But, I'm trying to think of a situation where just optimizing it for users wouldn't be the correct thing to do,...
4. Do the clustering algorithm of one of the cited baselines: take the k nearest neighbor graph (so at most k*N records elements), materialize it, and feed that through sklearns...