spacyr icon indicating copy to clipboard operation
spacyr copied to clipboard

morphological features in spacy?

Open randomgambit opened this issue 5 years ago • 5 comments

Hello there and thanks again for this wonderful wrapper!

I noticed that morphological features are available in spacy (VerbForm=Fin, Mood=Ind, Tense=Pres. See https://spacy.io/usage/linguistic-features#rule-based-morphology) but I was not able to extract these with spacy_parse(). Is this something that is currently available in spacyr?

Thanks!

randomgambit avatar Jun 11 '20 16:06 randomgambit

@randomgambit

Thanks for the question. It seems that the question is more about spaCy than spacyr. There is a StackOverflow question about it and no one has answered the question about it but no one has answered.

https://stackoverflow.com/questions/59008046/extracting-english-imperative-mood-from-verb-tags-with-spacy

And a linked question from it show that it's possible.

https://stackoverflow.com/questions/53755559/how-to-extract-tag-attributes-using-spacy

If you are working with English texts, probably you need to ask the developers of spaCy.

amatsuo avatar Jun 11 '20 16:06 amatsuo

Thanks @amatsuo but actually this is a question about spacyr :) I see that this can be retrieved in spacy, but what about spacyr?

Thanks!

randomgambit avatar Jun 11 '20 17:06 randomgambit

Do you have a sample python code where the information is retrieved?

amatsuo avatar Jun 11 '20 17:06 amatsuo

@amatsuo I just think the stackoverflow question gives the easy fix

The nlp.vocab.morphology.tag_map maps from the detailed tag to the dict with simpler tag, so you just need to skip that step and inspect the tag

import spacy
nlp = spacy.load('it')
doc = nlp('Ti è piaciuto il film?')
VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin

randomgambit avatar Jun 11 '20 17:06 randomgambit

@randomgambit

That's what I meant. You can get detailed morphological information in tag field for Italian model but not for English. See the output below:

library(spacyr)
spacy_initialize("en_core_web_sm")
#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 2.2.3, language model: en_core_web_sm)
#> (python options: type = "condaenv", value = "spacy_condaenv")
spacy_parse("The dog barks", tag = T)
#>   doc_id sentence_id token_id token lemma  pos tag entity
#> 1  text1           1        1   The   the  DET  DT       
#> 2  text1           1        2   dog   dog NOUN  NN       
#> 3  text1           1        3 barks  bark NOUN NNS
spacy_finalize()

spacy_initialize("it_core_news_sm")
#> Python space is already attached.  If you want to switch to a different Python, please restart R.
#> successfully initialized (spaCy Version: 2.2.3, language model: it_core_news_sm)
#> (python options: type = "condaenv", value = "spacy_condaenv")
spacy_parse("Ti è piaciuto il film?", tag = T)
#> Warning in spacy_parse.character("Ti è piaciuto il film?", tag = T):
#> lemmatization may not work properly in model 'it_core_news_sm'
#>   doc_id sentence_id token_id    token   lemma   pos
#> 1  text1           1        1       Ti      Ti  PRON
#> 2  text1           1        2        è  essere   AUX
#> 3  text1           1        3 piaciuto piacere  VERB
#> 4  text1           1        4       il      il   DET
#> 5  text1           1        5     film    film  NOUN
#> 6  text1           1        6        ?       ? PUNCT
#>                                                         tag entity
#> 1          PC__Clitic=Yes|Number=Sing|Person=2|PronType=Prs       
#> 2 VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin       
#> 3       V__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part       
#> 4     RD__Definite=Def|Gender=Masc|Number=Sing|PronType=Art       
#> 5                                            S__Gender=Masc       
#> 6                                                     FS___

Created on 2020-06-11 by the reprex package (v0.3.0)

amatsuo avatar Jun 11 '20 17:06 amatsuo