torchvision icon indicating copy to clipboard operation
torchvision copied to clipboard

Why the word_index of the dictionary of `imdb_dataset` are not unique?

Open danli349 opened this issue 11 months ago • 1 comments

Hello:

Why the word_index of the dictionary of imdb_dataset are not unique? Thanks

max_features <- 10000
imdb_train <- imdb_dataset(
  root = ".", 
  download = TRUE,
  split="train",
  num_words = max_features
)
word_index <- imdb_train$vocabulary
head(table(word_index))

word_index 31 32 33 34 35 36 130 216 196 197 165 160

danli349 avatar Jan 03 '25 16:01 danli349

Hello @danli349,

I guess your issue is related to {torchdatasets}, not to this project.

Would you be kind to open the issue in torchdatasets/issues ? and then to close issue in here ?

And also I would encourage you to use reprex::reprex() to include reproductible example to your issue, everyone would have spot immediately the mismatch.

Thanks

cregouby avatar Feb 12 '25 15:02 cregouby