Eric Zhu
Eric Zhu
I had the same problem. The docker process is not killed when you detach the interactive shell. I added a few lines after context.pty.Close() in "goHandleClient" to call "docker kill"...
Thanks for the PR. There is a problem though, consider a generator ```python from collections.abc import Iterable def test(): for i in range(len(100)): yield i isinstance(test(), Iterable) # True len(test())...
Yes. That's why checking if input is iterator is not enough to prevent the error I mentioned. This library is intended to provided an in-memory solution for similarity search. So...
Thanks for the pull request. I think maybe a more robust solution is to modify the similarity function to add set sizes as new arguments. So we can use a...
I made the required changes. Can you help me verify if the changes are correct by adding a unit test for your scenario? Thanks!
Thanks for your issue! The exact all-pair search algorithm builds an in-memory data structure (posting lists), and the size can be as big as your original input. Is your input...
You can also try the Go version which is probably more efficient due to the programming language used.
Issue #5 is also relevant to your problem, you can take a look at my response to that.
Interesting. The likely memory bottleneck is in the function [`_frequency_order_transform`](https://github.com/ekzhu/SetSimilaritySearch/blob/30188ca0257b644501f4a359172fddb2f89bf568/SetSimilaritySearch/utils.py#L91). Could you print the length of `counts` and `order`? It is possible that your shingled documents are very dissimilar --...
Because `_frequency_order_transform` must scan the input sets twice, `all_pair()` can't accept an iterator. Of course it is possible to modify it to accept a function that returns an iterator.