superml
superml copied to clipboard
CountVectorizer split argument doesn't do anything
I have the following example
# should be a vector of texts
sents <- c('i, am, going, home, and, home',
'where, are, you , going.? //// ',
'how, does, it, work')
cfv <- CountVectorizer$new(max_features = 10, remove_stopwords = FALSE, split = ", )
# generate the matrix
cf_mat <- cfv$fit_transform(sents)
head(cf_mat, 3)
As you can see after executing it, it doesn't split on the comma sign, but splits on space again.
Is this a bug? Would a Pull Request be welcome? Thanks in advance!
@nshahpazov it works correctly for me, what is the issue here?
head(cf_mat, 3) home going you work where it i how does are [1,] 2 1 0 0 0 0 1 0 0 0 [2,] 0 1 1 0 1 0 0 0 0 1 [3,] 0 0 0 1 0 1 0 1 1 0