Minimum set length for similarity calculation?

Open tomtaylor opened this issue 10 years ago • 1 comments

Hello, thanks a lot for predictor, it's a great library. I'd quite like to add a feature, and wanted your thoughts on it.

We have some items, in lists, and are calculating similarity based on how often items appear on lists with each other. We have lots of item that only appear on one list together, and we'd like to remove those from the prediction engine, forcing the distance to 0.

I can see that I can probably add a check in Distance for this, but I wanted to know if this was something you have an opinion about the design of, before I add a pull request.

Apr 09 '15 14:04 tomtaylor

Hi Tom - this is the kind of thing that would need to be rolled in to each of the three implementations for calculating distance. Modifying Distance would work for the Ruby implementation, but there's also Lua and Union to worry about (especially since they're so much faster that I think the Ruby implementation is mainly useful for backwards compatibility). I'm also curious what kind of API you were thinking of implementing for this? I don't have any thoughts about it off the top of my head, but it might warrant some discussion :smile:

Apr 09 '15 14:04 chanks