quanteda.textmodels icon indicating copy to clipboard operation
quanteda.textmodels copied to clipboard

Add crossval function that allows for k-fold cross validation of a machine learning model.

Open pchest opened this issue 5 years ago • 9 comments

The idea would be to create a function crossval(x, ...) that takes a machine learning model as an input and allows users to evaluate the model's performance across k splits of an evaluation data set.

pchest avatar Apr 30 '20 09:04 pchest

The quanteda.classifiers package contains the functions crossval() and performance() which allow for straightforward k-fold cross-validation of textmodel_nb() and textmodel_svm(). While these models are included in quanteda.textmodels, functions for cross-validation are missing in the package.

Would it make sense to add these functions to quanteda.textmodels? This will allow users to validate their models without having to install the development package quanteda.classifiers (which also imports keras)?

stefan-mueller avatar Jul 13 '22 12:07 stefan-mueller

This is a good suggestion. So long as Ken is onboard, I'll be happy to port them over.

pchest avatar Jul 13 '22 21:07 pchest

I wonder if this is the best approach, or whether an integration into the new(er) tidymodels framework would be the better way to proceed.

kbenoit avatar Jul 14 '22 16:07 kbenoit

Certainly, integrating into the tidymodels framework could help us to reach a larger audience. Just to clarify, the idea would be to make quanteda.textmodel functions compatible with the tidymodels cross-validation workflow (like this example) rather than duplicating our efforts, correct?

pchest avatar Jul 14 '22 21:07 pchest

yes exactly - but in a way that extends quanteda.textmodels rather than requiring any new package.

kbenoit avatar Jul 15 '22 14:07 kbenoit

Got it. I've cloned tidymodels to better understand how their functions work relative to quanteda. The functions that seem to be good starting points for improving compatibility are fit and fit_resamples, as they are crucial to the tidymodels k-fold cross-validation workflow.

pchest avatar Jul 17 '22 05:07 pchest

Hey 👋 just wanted to chime in to say that I'm here to help/answer questions related to any tidymodels effort :)

EmilHvitfeldt avatar Jul 17 '22 07:07 EmilHvitfeldt

@EmilHvitfeldt Hey! I'm glad to hear you are interested in our project. The objective would be to make quanteda.textmodel functions compatible with tidymodel model validation functions. For instance, we'd something like the following to work:


folds <- vfold_cv(data, v = 10)

nb_mod <- textmodel_nb()
nb_val <- nb_mod %>% 
    fit_resamples(folds)

collect_metrics(nb_val)

I'm examining both packages to see what changes would be needed to make this happen. Any suggestions would be welcome!

pchest avatar Jul 18 '22 05:07 pchest

Hi all,

Just wondering whether anything came of this idea? I am teaching with quanteda.textmodels this term and had been wondering whether there was a native quanteda cross-validation function for, e.g., textmodel_nb. Cheers!

jblumenau avatar Jan 20 '23 22:01 jblumenau