quanteda.textmodels Add crossval function that allows for k-fold cross validation of a machine learning model.

The idea would be to create a function crossval(x, ...) that takes a machine learning model as an input and allows users to evaluate the model's performance across k splits of an evaluation data set.

Apr 30 '20 09:04 pchest

The quanteda.classifiers package contains the functions crossval() and performance() which allow for straightforward k-fold cross-validation of textmodel_nb() and textmodel_svm(). While these models are included in quanteda.textmodels, functions for cross-validation are missing in the package.

Would it make sense to add these functions to quanteda.textmodels? This will allow users to validate their models without having to install the development package quanteda.classifiers (which also imports keras)?

Jul 13 '22 12:07 stefan-mueller

This is a good suggestion. So long as Ken is onboard, I'll be happy to port them over.

Jul 13 '22 21:07 pchest

I wonder if this is the best approach, or whether an integration into the new(er) tidymodels framework would be the better way to proceed.

Jul 14 '22 16:07 kbenoit

Certainly, integrating into the tidymodels framework could help us to reach a larger audience. Just to clarify, the idea would be to make quanteda.textmodel functions compatible with the tidymodels cross-validation workflow (like this example) rather than duplicating our efforts, correct?

Jul 14 '22 21:07 pchest

yes exactly - but in a way that extends quanteda.textmodels rather than requiring any new package.

Jul 15 '22 14:07 kbenoit

Got it. I've cloned tidymodels to better understand how their functions work relative to quanteda. The functions that seem to be good starting points for improving compatibility are fit and fit_resamples, as they are crucial to the tidymodels k-fold cross-validation workflow.

Jul 17 '22 05:07 pchest

Hey 👋 just wanted to chime in to say that I'm here to help/answer questions related to any tidymodels effort :)

Jul 17 '22 07:07 EmilHvitfeldt

@EmilHvitfeldt Hey! I'm glad to hear you are interested in our project. The objective would be to make quanteda.textmodel functions compatible with tidymodel model validation functions. For instance, we'd something like the following to work:


folds <- vfold_cv(data, v = 10)

nb_mod <- textmodel_nb()
nb_val <- nb_mod %>% 
    fit_resamples(folds)

collect_metrics(nb_val)

I'm examining both packages to see what changes would be needed to make this happen. Any suggestions would be welcome!

Jul 18 '22 05:07 pchest

Hi all,

Just wondering whether anything came of this idea? I am teaching with quanteda.textmodels this term and had been wondering whether there was a native quanteda cross-validation function for, e.g., textmodel_nb. Cheers!

Jan 20 '23 22:01 jblumenau

quanteda.textmodels quanteda.textmodels copied to clipboard

Add crossval function that allows for k-fold cross validation of a machine learning model.

quanteda.textmodels
quanteda.textmodels copied to clipboard