quanteda.textmodels
quanteda.textmodels copied to clipboard
Add crossval function that allows for k-fold cross validation of a machine learning model.
The idea would be to create a function crossval(x, ...) that takes a machine learning model as an input and allows users to evaluate the model's performance across k splits of an evaluation data set.
The quanteda.classifiers package contains the functions crossval() and performance() which allow for straightforward k-fold cross-validation of textmodel_nb() and textmodel_svm(). While these models are included in quanteda.textmodels, functions for cross-validation are missing in the package.
Would it make sense to add these functions to quanteda.textmodels? This will allow users to validate their models without having to install the development package quanteda.classifiers (which also imports keras)?
This is a good suggestion. So long as Ken is onboard, I'll be happy to port them over.
I wonder if this is the best approach, or whether an integration into the new(er) tidymodels framework would be the better way to proceed.
Certainly, integrating into the tidymodels framework could help us to reach a larger audience. Just to clarify, the idea would be to make quanteda.textmodel functions compatible with the tidymodels cross-validation workflow (like this example) rather than duplicating our efforts, correct?
yes exactly - but in a way that extends quanteda.textmodels rather than requiring any new package.
Got it. I've cloned tidymodels to better understand how their functions work relative to quanteda. The functions that seem to be good starting points for improving compatibility are fit and fit_resamples, as they are crucial to the tidymodels k-fold cross-validation workflow.
Hey 👋 just wanted to chime in to say that I'm here to help/answer questions related to any tidymodels effort :)
@EmilHvitfeldt Hey! I'm glad to hear you are interested in our project. The objective would be to make quanteda.textmodel functions compatible with tidymodel model validation functions. For instance, we'd something like the following to work:
folds <- vfold_cv(data, v = 10)
nb_mod <- textmodel_nb()
nb_val <- nb_mod %>%
fit_resamples(folds)
collect_metrics(nb_val)
I'm examining both packages to see what changes would be needed to make this happen. Any suggestions would be welcome!
Hi all,
Just wondering whether anything came of this idea? I am teaching with quanteda.textmodels this term and had been wondering whether there was a native quanteda cross-validation function for, e.g., textmodel_nb. Cheers!