h2o-3 icon indicating copy to clipboard operation
h2o-3 copied to clipboard

The h2o.findSynonyms failed if the 'word' parameter is uknown for the word2vec model

Open dmresearch15 opened this issue 9 months ago • 3 comments

Received the following error when attempting to execute print(h2o.findSynonyms(w2v_model, "National", count = 5)): Error in eval(substitute(expr), data, enclos = parent.frame()) : object 'score' not found Curious about the absence of the 'score' parameter.

In contrast, when employing print(h2o.findSynonyms(w2v_model, "national", count = 5)), the score is generated as expected.

dmresearch15 avatar May 06 '24 03:05 dmresearch15

Hi @dmresearch15. Thanks for reporting this issue.

It looks like there is a bug, that we cannot return results without an error for an unseen word.

We definitely need to fix it.

maurever avatar May 09 '24 08:05 maurever

I reproduced the error by this code:

job_titles <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv",  col.names = c("category", "jobtitle"), col.types = c("String", "String"), header = TRUE)

words <- h2o.tokenize(job_titles, " ")
vec <- h2o.word2vec(training_frame = words)

// pass    
syn <- h2o.findSynonyms(vec, "teacher", count = 20)
print(syn)

// fail    
syn2 <- h2o.findSynonyms(vec, "Tteacher", count = 20)
print(syn2)

maurever avatar May 09 '24 08:05 maurever

I'm presently incorporating this into my project. It's helpful to have a timeframe for resolving this issue.

dmresearch15 avatar May 10 '24 10:05 dmresearch15

Hi @dmresearch15, I fixed the bug in R API here: https://github.com/h2oai/h2o-3/pull/16280. Hopefully, this change will be released in the fix release at the end of the week.

If the model can't find synonyms, it failed with the error you shared. The question still is, why can your model find synonyms for "national" and not for "National"? You may need to tune your model a little bit.

maurever avatar May 29 '24 12:05 maurever