Minimum set length for similarity calculation?
Hello, thanks a lot for predictor, it's a great library. I'd quite like to add a feature, and wanted your thoughts on it.
We have some items, in lists, and are calculating similarity based on how often items appear on lists with each other. We have lots of item that only appear on one list together, and we'd like to remove those from the prediction engine, forcing the distance to 0.
I can see that I can probably add a check in Distance for this, but I wanted to know if this was something you have an opinion about the design of, before I add a pull request.
Hi Tom - this is the kind of thing that would need to be rolled in to each of the three implementations for calculating distance. Modifying Distance would work for the Ruby implementation, but there's also Lua and Union to worry about (especially since they're so much faster that I think the Ruby implementation is mainly useful for backwards compatibility). I'm also curious what kind of API you were thinking of implementing for this? I don't have any thoughts about it off the top of my head, but it might warrant some discussion :smile: