Forest Gregg
Forest Gregg
i'd welcome a PR that showed substantial improvement. In the exploration in #305, the message passing has to do with the way that multiprocessing typically works in python. some process...
as an aside, it would be very interesting to make a new library specifically work databases that subclassed the dedupe library, maybe backed by sqlalchemy but with method that could...
#856 would be a good way around that.
anyway, as a next step, you plan makes sense, Flávio
@fjsj, there are two points of inteprocess communication 1. move the the records to the child process to block 2. move the block keys to the parent process if the...
i looked at this in a spike https://github.com/dedupeio/dedupe/commit/592604694b884d62e1b49a319dd526847a748c7c it was still slower than not.
yay! https://github.com/iesl/learned-string-alignments
siamese models: https://medium.com/peak-product/towards-reusable-entity-resolution-eed1c6ee4a14
https://medium.com/@gerrit.anders/accelerate-through-matched-data-42d4a11d6d4d
https://github.com/megagonlabs/ditto https://github.com/anhaidgroup/deepmatcher