mordecai icon indicating copy to clipboard operation
mordecai copied to clipboard

ValueError: Input 0 of layer sequential is incompatible with the layer

Open domeniconappo opened this issue 4 years ago • 6 comments

Hi, updating mordecai to 2.1.0 and dependencies: tensorflow to 2.3.0 spacy to 2.3.2 keras to 2.4.3

Our geocoding processing now is much slower as we've started to observe lots of errors printing to console like the following:

ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 12 but received input with shape [None, 0]

It's not clear how this is influencing geocoding but for sure it's much slower as our queues are constantly building up and accumulating documents to be geoparsed.

Can you help? Is it a problem with deps versions?

Thank you in advance and for your great work!

domeniconappo avatar Aug 20 '20 11:08 domeniconappo

Huh, that's frustrating. I really didn't change that much beyond bumping the versions, so I'm not sure where the slowdown is coming from. Do you have a document that produces the ValueError that you can share?

ahalterman avatar Aug 21 '20 15:08 ahalterman

I just start with mordecai last week, but i got the same problem described by @domeniconappo . After a lot of tests changing versions, trying to use cuda etc... nothing changed. Then i gave a try on jupyter notebook. I don't know why, but analysis became a lot faster. The only lib version that differs from @domeniconappo and my own old script is tensorflow (1.14.0 installed by conda)

marcusvrlopes avatar Aug 25 '20 18:08 marcusvrlopes

Hi @ahalterman, even I am getting the same issue while using the package. The issue is occurring due to the identification of some irrelevant terms as geo terms in my case. After the code lookup, I found out that in geoparse.py in line# 731 while we call this: prediction = self.country_model.predict(i['matrix']).transpose()[0] the matrix for the word generated is empty and of shape (1,0). So let me know if we can filter out the below code based on the empty matrix(in line# 722 geoparse.py): feat = self.make_country_matrix(loc).

Example of the geo-terms identified which are causing the issue:

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'organomercury'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'orangeiron'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'redoxygen'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'FeC10(HgCl)10'}

[{'text': 'organomercury', 'label': '', 'word': 'organomercury', 'spans': [{'start': 900, 'end': 913}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'Pbca', 'label': '', 'word': 'Pbca', 'spans': [{'start': 4644, 'end': 4648}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': 'POL', 'most_alt': 'CHN', 'most_pop': 'MEX', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'orangeiron', 'label': '', 'word': 'orangeiron', 'spans': [{'start': 6157, 'end': 6167}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'redoxygen', 'label': '', 'word': 'redoxygen', 'spans': [{'start': 6184, 'end': 6193}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'metallocene moiety', 'label': '', 'word': 'metallocene moiety', 'spans': [{'start': 6935, 'end': 6953}], 'features': {'maj_vote': '', 'word_vec': 'GNQ', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 4.130288124084473, 'class_mention': '', 'code_mention': ''}}, {'text': '3.447(1)Å (Figure1C', 'label': '', 'word': '3.447(1)Å (Figure1C', 'spans': [{'start': 7585, 'end': 7604}], 'features': {'maj_vote': '', 'word_vec': 'TUR', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 1.3494553565979004, 'class_mention': '', 'code_mention': ''}}, {'text': 'FeC10(HgCl)10', 'label': '', 'word': 'FeC10(HgCl)10', 'spans': [{'start': 12695, 'end': 12708}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'Deutsche Forschungsgemeinschaft', 'label': '', 'word': 'Deutsche Forschungsgemeinschaft', 'spans': [{'start': 13577, 'end': 13608}], 'features': {'maj_vote': '', 'word_vec': 'DEU', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 10.370280265808105, 'class_mention': '', 'code_mention': ''}}, {'text': 'ZEDAT/FU Berlin', 'label': '', 'word': 'ZEDAT/FU Berlin', 'spans': [{'start': 13713, 'end': 13728}], 'features': {'maj_vote': '', 'word_vec': 'DEU', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 11.895607948303223, 'class_mention': '', 'code_mention': ''}}]

vupadhyaya19 avatar May 27 '21 19:05 vupadhyaya19

Hi @ahalterman, I did the changes in geoparse.py and the issue is not occurring now. Let me know if the below code changes can be committed and pushed. geoparse.txt

vupadhyaya19 avatar Jun 01 '21 12:06 vupadhyaya19

@vupadhyaya19: can you open a pull request with your changes?

I'm hoping to make v3 public in July and that should resolve the issue because it switches from TF to pytorch, but I'd like to leave this version in a usable form for people who might stick with it.

ahalterman avatar Jun 12 '21 23:06 ahalterman

@vupadhyaya19: can you open a pull request with your changes?

I'm hoping to make v3 public in July and that should resolve the issue because it switches from TF to pytorch, but I'd like to leave this version in a usable form for people who might stick with it.

Hi, @ahalterman ! First of all, thank you for your job!

Looks like I have same issue described above, so:

  • Can you please update us with v3? Any chance that you will share it with community?

luizavladislavna avatar Aug 12 '21 17:08 luizavladislavna