concise-concepts
concise-concepts copied to clipboard
Still unable to pass in a custom Gensim model
Raised an issue earlier regarding the same problem and @davidberenstein1957 committed a fix and posted this code block as solution
import spacy from spacy import displacy
import concise_concepts
data = { "fruit": ["apple", "pear", "orange"], "vegetable": ["broccoli", "spinach", "tomato", "garlic", "onion", "beans"], "meat": ["beef", "pork", "fish", "lamb", "bacon", "ham", "meatball"], "dairy": ["milk", "butter", "eggs", "cheese", "cheddar", "yoghurt", "egg"], "herbs": ["rosemary", "salt", "sage", "basil", "cilantro"], "carbs": ["bread", "rice", "toast", "tortilla", "noodles", "bagel", "croissant"], }
text = """ Heat the oil in a large pan and add the Onion, celery and carrots. Then, cook over a medium–low heat for 10 minutes, or until softened. Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes. Later, add some oranges and chickens. """
model_path = "word2vec.model"
nlp = spacy.load("en_core_web_md", disable=["ner"]) nlp.add_pipe( "concise_concepts", config={ "data": data, "model_path": model_path, "ent_score": True, }, ) doc = nlp(text)
options = { "colors": { "fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon", "dairy": "lightblue", "herbs": "darkgreen", "carbs": "lightbrown", }, "ents": ["fruit", "vegetable", "meat", "dairy", "herbs", "carbs"], }
ents = doc.ents for ent in ents: new_label = f"{ent.label_} ({float(ent.ent_score):.0%})" options["colors"][new_label] = options["colors"].get(ent.label.lower(), None) options["ents"].append(new_label) ent.label_ = new_label doc.ents = ents
displacy.render(doc, style="ent", options=options)
However, I am still getting the 'Word2vec object is not iterable error'.
Could you please look into it?
Hello,
I feel this was resolved by installing the dependencies required by the package. Gensim >= 4.
Regards, David
On 5 Jun 2022, at 15:24, akshaydevml @.***> wrote:
Raised an issue earlier regarding the same problem and @davidberenstein1957 committed a fix and posted this code block as solution
import spacy from spacy import displacy
import concise_concepts
data = { "fruit": ["apple", "pear", "orange"], "vegetable": ["broccoli", "spinach", "tomato", "garlic", "onion", "beans"], "meat": ["beef", "pork", "fish", "lamb", "bacon", "ham", "meatball"], "dairy": ["milk", "butter", "eggs", "cheese", "cheddar", "yoghurt", "egg"], "herbs": ["rosemary", "salt", "sage", "basil", "cilantro"], "carbs": ["bread", "rice", "toast", "tortilla", "noodles", "bagel", "croissant"], }
text = """ Heat the oil in a large pan and add the Onion, celery and carrots. Then, cook over a medium–low heat for 10 minutes, or until softened. Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes. Later, add some oranges and chickens. """
model_path = "word2vec.model"
nlp = spacy.load("en_core_web_md", disable=["ner"]) nlp.add_pipe( "concise_concepts", config={ "data": data, "model_path": model_path, "ent_score": True, }, ) doc = nlp(text)
options = { "colors": { "fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon", "dairy": "lightblue", "herbs": "darkgreen", "carbs": "lightbrown", }, "ents": ["fruit", "vegetable", "meat", "dairy", "herbs", "carbs"], }
ents = doc.ents for ent in ents: new_label = f"{ent.label_} ({float(ent.ent_score):.0%})" options["colors"][new_label] = options["colors"].get(ent.label.lower(), None) options["ents"].append(new_label) ent.label_ = new_label doc.ents = ents
displacy.render(doc, style="ent", options=options)
However, I am still getting the 'Word2vec object is not iterable error'.
Could you please look into it?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error
Could you send me some reproducible code and files you are using?
On 5 Jun 2022, at 15:57, akshaydevml @.***> wrote:
I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Sure, here is the code snipped I used
import pandas as pd df = pd.read_csv('IMDB Dataset.csv')
from gensim.models.phrases import Phrases, Phraser from gensim.models import Word2Vec sent = [row.split() for row in df['review']] phrases = Phrases(sent, min_count=30, progress_per=10000) bigram = Phraser(phrases) sentences = bigram[sent]
from gensim.models import Word2Vec
w2v_model = Word2Vec(min_count=20,
window=2,
vector_size=200,
sample=6e-5,
alpha=0.03,
min_alpha=0.0007,
negative=20,
)
w2v_model.build_vocab(sentences, progress_per=10000)
w2v_model.train(sentences, total_examples=w2v_model.corpus_count, epochs=10, report_delay=1)
w2v_model.save("film.model")
import spacy from spacy import displacy import concise_concepts nlp = spacy.load('en_core_web_md', disable=["ner"]) data = { "fruit": ["apple", "pear", "orange"], "vegetable": ["broccoli", "spinach", "tomato"], "meat": ["beef", "pork", "fish", "lamb"] }
model_path = "film.model"
nlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})
Hi David, I am facing the same error while trying to pass my custom trained word2vec model.Have tried every scenario which you had posted earlier.Have even reffered to the word2vec model doccumentation to train my model as prescribed.Even then getting the error. Even for this code snippet
import spacy
from spacy import displacy
import concise_concepts
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}
text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW. '''
from gensim.test.utils import common_texts from gensim.models import Word2Vec model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4) model.save("word2vec.model")
model_path = "Word2vec.model" nlp = spacy.load("en_core_web_lg", disable=['ner'])
ent_score for entity condifence scoring
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path}) doc = nlp(text)
Error:
~\anaconda3\lib\site-packages\concise_concepts\conceptualizer\Conceptualizer.py in verify_data(self, verbose) 107 for key, value in self.data.items(): 108 verified_values = [] --> 109 if key.replace(" ", "_") not in self.kv: 110 if verbose: 111 logger.warning(f"key {key} not present in word2vec model")
TypeError: argument of type 'Word2Vec' is not iterable
I'm taking a look this week.
@prakhar251998 I also have this problem. Have you solved it somehow?
Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part
Hello,I made some initial progress last week but I will be able to wrap it up coming week. Regards,David On 20 Sep 2022, at 07:41, prakhar251998 @.***> wrote: Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@davidberenstein1957 Thanks.
First
I don't know if it can help you, I have gensim==4.2.0
, I have seen (very fast) the Conceptualizer.py
library and it seems that several times (in the functions as verify_data(), expand_concepts()...etc) the error is due to an iteration like:
if key.replace ("", "_") not in self.kv
However, where self.kv is not the vocab keys (I don't know if this code expect to find the vocab keys as self.kv)
I tried to replace this iteration with:
keys_list = list (self.kv.wv.key_to_index.keys())
...
if key.replace ("", "_") not in keys_list:
...
This happens multiple times in the library.
There are also other errors, such as;
self.kv.most_similar
that need to be:
self.kv.wv.most_similar
and others like this.
Even by correcting these errors, all works but the model mismatches my words.
Second
Then, I have a question if possible. I'm new with Gensim, I noticed that the key of the given dictionary must necessarily be in the Word2Vec vocab.
Example:
data = {
"word A": ["house", "home", ...],
"word B": ['display', 'smartphone', ...],
}
model = Word2Vec(sentences=common_texts, ...)
...
nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True, "model_path": model_path})
So word A and word B need to be in the model vocab. Otherwise, I have a key not found error. The initial training sentences need these keys in it I guess?
Thanks
I just resolved this. @GenVr @prakhar251998 @akshaydevml thank you for the input!
@davidberenstein1957 thanks. I have tried this code (with your new changes) but still have the error reported at the end.
import spacy
from spacy import displacy
import concise_concepts
from gensim.test.utils import common_texts
from gensim.models import Word2Vec
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}
text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")
model_path = "word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
Error:
WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-5-4778ce6d6aae>](https://localhost:8080/#) in <module>
1 nlp = spacy.load("en_core_web_lg", disable=['ner'])
----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
[/usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py](https://localhost:8080/#) in verify_data(self, verbose)
182 verified_values
183 ), f"None of the entries for key {key} are present in the word2vec model"
--> 184 self.data = deepcopy(verified_data)
185 self.original_data = deepcopy(self.data)
186
AssertionError: None of the entries for key display are present in the word2vec model
Hello,
This is actually expected behaviour, since you are trying to match a label and words that are not present in the trained wor2vec model.
You initially get warning regarding the missing keys and words, but since none of the data is available in the model, it actually raises an error.
It did let me to find another small implementation error with the ngram support, so keep the feedback comming!
Regards, David
On 26 Sept 2022, at 14:11, GennaroV @.***> wrote:
@davidberenstein1957 https://github.com/davidberenstein1957 thanks. I have tried this code (with your new changes) but still have the error reported at the end.
import spacy from spacy import displacy import concise_concepts from gensim.test.utils import common_texts from gensim.models import Word2Vec
data = { "display":["pixel","resolution","touchscreen"], "performace":['multitask','processor','graphics','ram','hang'], "storage":["internal","memory","expandable"], "camera" :["focus","resolution","flash","photos"], "Battery":["capacity","quick","charging"], "connectivity":['gps','bluetooth','wifi','sim'], "sensors":["light","proximity","compass","gyroscope"] }
text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW. '''
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4) model.save("word2vec.model") model_path = "word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner']) nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path}) Error:
WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model
AssertionError Traceback (most recent call last)
in 1 nlp = spacy.load("en_core_web_lg", disable=['ner']) ----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path}) /usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py in verify_data(self, verbose) 182 verified_values 183 ), f"None of the entries for key {key} are present in the word2vec model" --> 184 self.data = deepcopy(verified_data) 185 self.original_data = deepcopy(self.data) 186
AssertionError: None of the entries for key display are present in the word2vec model — Reply to this email directly, view it on GitHub https://github.com/Pandora-Intelligence/concise-concepts/issues/10#issuecomment-1257937286, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAZHZEQDQFLA3BPQOFFXADWAGHGLANCNFSM5X5BV66A. You are receiving this because you were mentioned.