text2vec
text2vec copied to clipboard
itoken returned data structure is not documented
The documentation for itoken
is silent about the data structure that is returned. It appears to be an R6 object with a few public functions and variables, but I cannot figure out what they are.
For context, I am trying to create one-hot encoded (long-vector) word embeddings for teaching/demonstration purposes. More specifically I want
- load texts, create vocabulary
- transform words to the corresponding one-hot encoded vectors
- combine nearby words into corresponding word embeddings (using one-hot vectors).
In a sense, this is equivalent to working with a DTM where each document is an individual word. As such DTM easily get's large, I am trying to find a way to iterate over individual words.