Emil Hvitfeldt
Emil Hvitfeldt
Part of https://github.com/tidymodels/planning/issues/29 VERY WIP
Expanding on https://github.com/tidymodels/textrecipes/pull/265 some of the functions, especially the unique variants are still more memory intensive then I would have liked. This could be fixed by writing the in C
Right now we are doing `log( 1 + (N / n_j))`, but wikipedia has `log(N / (1 + n_j)) + 1` and scikit-learn does > If smooth_idf=True (the default), the...
examples here: https://textrecipes.tidymodels.org/reference/step_tfidf.html
This worked but it is hardly minimal 😓 https://github.com/tidymodels/textrecipes/pull/251/files#diff-7a4d6e75d2d9b8a28afc680e8d25135692f1aba17f9fff9cc737128ce795aff2
Need to figure out how to apply a trained stm model to new data.
Overview of the different tokenization options, present in the package and their influence.