morphological features in spacy?
Hello there and thanks again for this wonderful wrapper!
I noticed that morphological features are available in spacy (VerbForm=Fin, Mood=Ind, Tense=Pres. See https://spacy.io/usage/linguistic-features#rule-based-morphology) but I was not able to extract these with spacy_parse(). Is this something that is currently available in spacyr?
Thanks!
@randomgambit
Thanks for the question. It seems that the question is more about spaCy than spacyr. There is a StackOverflow question about it and no one has answered the question about it but no one has answered.
https://stackoverflow.com/questions/59008046/extracting-english-imperative-mood-from-verb-tags-with-spacy
And a linked question from it show that it's possible.
https://stackoverflow.com/questions/53755559/how-to-extract-tag-attributes-using-spacy
If you are working with English texts, probably you need to ask the developers of spaCy.
Thanks @amatsuo but actually this is a question about spacyr :)
I see that this can be retrieved in spacy, but what about spacyr?
Thanks!
Do you have a sample python code where the information is retrieved?
@amatsuo I just think the stackoverflow question gives the easy fix
The nlp.vocab.morphology.tag_map maps from the detailed tag to the dict with simpler tag, so you just need to skip that step and inspect the tag
import spacy
nlp = spacy.load('it')
doc = nlp('Ti è piaciuto il film?')
VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
@randomgambit
That's what I meant. You can get detailed morphological information in tag field for Italian model but not for English. See the output below:
library(spacyr)
spacy_initialize("en_core_web_sm")
#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 2.2.3, language model: en_core_web_sm)
#> (python options: type = "condaenv", value = "spacy_condaenv")
spacy_parse("The dog barks", tag = T)
#> doc_id sentence_id token_id token lemma pos tag entity
#> 1 text1 1 1 The the DET DT
#> 2 text1 1 2 dog dog NOUN NN
#> 3 text1 1 3 barks bark NOUN NNS
spacy_finalize()
spacy_initialize("it_core_news_sm")
#> Python space is already attached. If you want to switch to a different Python, please restart R.
#> successfully initialized (spaCy Version: 2.2.3, language model: it_core_news_sm)
#> (python options: type = "condaenv", value = "spacy_condaenv")
spacy_parse("Ti è piaciuto il film?", tag = T)
#> Warning in spacy_parse.character("Ti è piaciuto il film?", tag = T):
#> lemmatization may not work properly in model 'it_core_news_sm'
#> doc_id sentence_id token_id token lemma pos
#> 1 text1 1 1 Ti Ti PRON
#> 2 text1 1 2 è essere AUX
#> 3 text1 1 3 piaciuto piacere VERB
#> 4 text1 1 4 il il DET
#> 5 text1 1 5 film film NOUN
#> 6 text1 1 6 ? ? PUNCT
#> tag entity
#> 1 PC__Clitic=Yes|Number=Sing|Person=2|PronType=Prs
#> 2 VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
#> 3 V__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part
#> 4 RD__Definite=Def|Gender=Masc|Number=Sing|PronType=Art
#> 5 S__Gender=Masc
#> 6 FS___
Created on 2020-06-11 by the reprex package (v0.3.0)