gsoc2018-spacy icon indicating copy to clipboard operation
gsoc2018-spacy copied to clipboard

Sentence splitter not working properly affecting part of speech tagger

Open dkatsiros opened this issue 5 years ago • 5 comments

Problem

I tried to run the sentence splitter submodule (sentence_splitter.py) but it didn't work in Greek language for me. I tried loading both el_core_news_sm and el_core_news_md and also tried inserting and encoding text in unicode utf-8. However it does not recognize different sentences but sees them as one. At the same time this affects the part of speech tagger. Do you have any idea what might the problem be?

Thanks in advance.

Environment

spaCy version: 2.1.4
Location: /home/dimitris/.local/lib/python3.6/site-packages/spacy Platform: Linux-4.18.0-17-generic-x86_64-with-Ubuntu-18.04-bionic Python version: 3.6.7
Models: el, en

dkatsiros avatar May 17 '19 17:05 dkatsiros

Hello, thanks for reporting this! Could you please tell from where did you download the models? Are you using spacy-nightly?

giannisdaras avatar May 17 '19 17:05 giannisdaras

No, I downloaded the models from https://spacy.io/models/el . Should I use spacy-nightly?

dkatsiros avatar May 17 '19 18:05 dkatsiros

Could you try uninstall spacy, install spacy-nightly, download the models through nightly and then check again? Sorry for the trouble, I need to check if it is a version problem.

giannisdaras avatar May 17 '19 19:05 giannisdaras

I tried but I faced the same problem . In order to install models through spacy-nightly I used: python3 -m spacy install el_core_news_md . Is that correct? Any other suggestion on something that I may did wrong?

dkatsiros avatar May 18 '19 19:05 dkatsiros

I am facing the same problem after trying both.

PanosAntoniadis avatar Jun 29 '19 16:06 PanosAntoniadis