python-dandelion-eu
python-dandelion-eu copied to clipboard
Cannot detect language on EN text
Hello, I am getting this error when I try to process this English text:
I'm getting a bit confused by tech companies' thinking around the future of remote working, and I imagine I'm not the only one.
Months of working from home have made many businesses and their employees question whether the typical 9-5 working model is necessary in an age where work is increasingly done in front of a computer that provides instantaneous connection to anyone, anywhere in the world.
In this special feature, ZDNet examines technology's role in helping business leaders build tomorrow's workforce, and employees keep their skills up to date and grow their careers.
The article is longer, you can find it here: https://www.zdnet.com/article/is-remote-working-good-or-bad-big-tech-companies-just-cant-seem-to-decide/
The error message:
Traceback (most recent call last):
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/flask_restful/__init__.py", line 467, in wrapper
resp = resource(*args, **kwargs)
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/flask/views.py", line 84, in view
return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs)
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
resp = meth(*args, **kwargs)
File "/home/manentai/flaskAI/app.py", line 119, in post
app.PD.process(document_id)
File "/home/manentai/flaskAI/process_data.py", line 97, in process
response = datatxt.nex(sentence)#, include_categories=True, include_types=True)
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/dandelion/datatxt.py", line 14, in nex
return self.do_request(
File "/home/manentai/mambaforge/envs/flaskenv/lib/python3.8/site-packages/dandelion/base.py", line 102, in do_request
raise DandelionException(obj)
dandelion.base.DandelionException: Cannot detect language
what can cause this confusion on an English text? Am I supposed to tell Dandelion that it's in English?
Hi Simone, I have tested it and it seems to work. Ca you please check your code? you should not have problems. You may write here the code snippet, if you want
Hi, thanks for getting back to me.
Actually the snippet I have is working on other texts I am trying, so I guess it might be a problem with the encoding of the text:
# parse article with SpaCy
doc = nlp(document["text"])
# Get the list of sentences in all the articles
sentences = [i.text for i in doc.sents]
# extract NER and keywords
for sentence in sentences:
# extract NER with Dandelion.eu
response = datatxt.nex(sentence)
I am at a loss actually, with some text I have works, and with other texts raise the error...
ok I think I solved it... If I parse sentences like this:
sentences = [i.text for i in doc.sents]
I will also get empty sentences, and the API crashes... if you specified it in the docs, I missed, sorry...
So this one is sufficient to fix the issue:
sentences = [i.text.strip() for i in doc.sents if i.text.strip()!=""]
Hi Simone, I am glad you have solved it.
I have checked: the full error response message from the API, on empty texts, is Cannot detect language:text is empty or null
. It seems that the Python exception cuts out the second, meaningful part.