Kenneth Benoit
Kenneth Benoit
This is a really good idea that would offer a solution for #180.
Closing but if you are still having trouble, add a note.
Agreed! Perhaps we could add a set of spacyr_options either via `option()` or something like `quanteda_options()` that define defaults that can be redefined the by the user, or reset prior...
Just experimented with this. A few comments on the branch. Since this is looking up the tokens from the language model using `Token.vector()`, we don't really need to do this...
How about ```r # works on a spacyr parsed object wordvectors_get.spacyr_parsed(x, model) # works on a named list of characters, such as from spacy_tokenize() wordvectors_get.list(x, model) ``` to return a...
Great idea! Should be pretty straightforward to implement.
We don't plan to provide tools for modifying or training language models, but if a user has custom language models, we agree that spacyr should allow these to be used....
Not a bad idea. @amatsuo maybe add: ```r spacy_tokenize(x, what = c("word", "sentence"), remove_numbers = FALSE, remove_punct = FALSE, remove_symbols = FALSE, remove_separators = TRUE, remove_twitter = FALSE, remove_hyphens =...
@cecilialee No, for training a new language model you would need to do that in Python using the spaCy instructions. We unlikely to add this facility to **spacyr** in the...
@aourednik that would be ```r devtools::install_github("quanteda/spacyr", ref = "tokenize-function") ```