Distance-Measures icon indicating copy to clipboard operation
Distance-Measures copied to clipboard

Handling hashes

Open keynmol opened this issue 11 years ago • 0 comments

I was just wondering, is it possible to add support for Hash in similiarity methods? I've cauht myself re-implementing it without knowing of the existence of this gem. For example for cosine similarity I see something like that included in Hash:

module DistanceMeasures
  def cosine_similarity(other)
        shared_keys=self.keys & other.keys
        if shared_keys.empty?
            return 0.0  
        else
            self_values=self.values_at(*shared_keys) 
            other_values=other.values_at(*shared_keys) 
            dot_product = self_values.dot_product(*other_values)
            normalization = self_values.euclidean_normalize * other_values.euclidean_normalize
            handle_nan(dot_product / normalization)
        end
    end
end

Why metrics for Hash? For example one common case where hashes are compared arises from information retrieval tasks, where vectors have such enormous number of dimensions, that it is considerably better to store them as hashes {"word"=> <..weight..> }

keynmol avatar Apr 28 '13 17:04 keynmol