text2vec itoken returned data structure is not documented

itoken returned data structure is not documented

Open otoomet opened this issue 1 year ago • 0 comments

The documentation for itoken is silent about the data structure that is returned. It appears to be an R6 object with a few public functions and variables, but I cannot figure out what they are.

For context, I am trying to create one-hot encoded (long-vector) word embeddings for teaching/demonstration purposes. More specifically I want

load texts, create vocabulary
transform words to the corresponding one-hot encoded vectors
combine nearby words into corresponding word embeddings (using one-hot vectors).

In a sense, this is equivalent to working with a DTM where each document is an individual word. As such DTM easily get's large, I am trying to find a way to iterate over individual words.

Mar 11 '23 19:03 otoomet

text2vec text2vec copied to clipboard

itoken returned data structure is not documented

text2vec
text2vec copied to clipboard