vecto icon indicating copy to clipboard operation
vecto copied to clipboard

WordEmbeddingsDense.has_word always return True

Open pierremarchal opened this issue 3 years ago • 0 comments

https://github.com/vecto-ai/vecto/blob/7d5d7b8690e2b52cbe32bea661caa21614e9c60d/vecto/embeddings/dense.py#L244

If OOV then i == 0 which means that i < 0 is always False hence has_word always return True

I think it would make more sense for get_id to propagate the KeyError so you can catch it in has_word (and return False)

pierremarchal avatar Oct 26 '22 11:10 pierremarchal

Because of this bug get_vector returns the vector at index 0 for any OOV word. This is a big problem when running the similarity benchmark (this is how I found this issue).

pierremarchal avatar Oct 26 '22 11:10 pierremarchal