Distance-Measures
Distance-Measures copied to clipboard
Handling hashes
I was just wondering, is it possible to add support for Hash in similiarity methods? I've cauht myself re-implementing it without knowing of the existence of this gem. For example for cosine similarity I see something like that included in Hash:
module DistanceMeasures
def cosine_similarity(other)
shared_keys=self.keys & other.keys
if shared_keys.empty?
return 0.0
else
self_values=self.values_at(*shared_keys)
other_values=other.values_at(*shared_keys)
dot_product = self_values.dot_product(*other_values)
normalization = self_values.euclidean_normalize * other_values.euclidean_normalize
handle_nan(dot_product / normalization)
end
end
end
Why metrics for Hash? For example one common case where hashes are compared arises from information retrieval tasks, where vectors have such enormous number of dimensions, that it is considerably better to store them as hashes {"word"=> <..weight..> }