text2vec icon indicating copy to clipboard operation
text2vec copied to clipboard

itoken returned data structure is not documented

Open otoomet opened this issue 1 year ago • 0 comments

The documentation for itoken is silent about the data structure that is returned. It appears to be an R6 object with a few public functions and variables, but I cannot figure out what they are.

For context, I am trying to create one-hot encoded (long-vector) word embeddings for teaching/demonstration purposes. More specifically I want

  1. load texts, create vocabulary
  2. transform words to the corresponding one-hot encoded vectors
  3. combine nearby words into corresponding word embeddings (using one-hot vectors).

In a sense, this is equivalent to working with a DTM where each document is an individual word. As such DTM easily get's large, I am trying to find a way to iterate over individual words.

otoomet avatar Mar 11 '23 19:03 otoomet