tune Variable Fold Weights

I'm currently working through a project where there's time dependency, so I'm using expanding-window folds (e.g. fold 1 is from 2018-2021 with test set 2022, fold 2 is 2018-2022 with test set 2023, etc.). When running a hyperparameter optimization, to my understanding the code will create an average value of the evaluation metric, weighing each of these folds equivalently, even though they have differing number of records. In this case, it would probably make more sense to weight their contribution to the evaluation metric proportional to the number of instances in the given fold.

An example might be:

Fold	N	Metric
1	100	0.80
2	200	0.85
3	250	0.90
4	300	0.95

We should weigh fold 4 more than fold 1 when deciding the hyperparameters. So, under equal weighting we would get 0.875, whereas under weighting proportional to N would give 0.894.

I'm happy to work on this myself and make a PR, just wanted to confirm it would be a supported feature before I invested time into it.

As far as implementation, I think the straightforward way would be to pass a vector to the various tune functions (tune_bayes or tune_grid) indicating weights per-fold, and possibly make a helper function that creates that vector for you based on fold population, which seems like the most obvious case where you would use this. I could also see some argument for putting it into the rset object, but that might be a bit over-engineered.

Mar 10 '25 13:03 tjburch

That’s a good idea. I think that estimate_tune_results() is what would be affected.

We should think more about how the implementation would work (in terms of the API). You could propose something (maybe in a fork or draft PR). One idea would be to to add a specialized column to the rset (maybe .metric_weights) that the system can consume when it needs to.

We are on a pretty tight/intense development schedule until about August so I don’t think that anything would be fully implemented until then (but we’ll be happy to discuss and advise).

Mar 10 '25 16:03 topepo

Sounds good. Not a major rush so if it takes O(months) to get folded in, that's not a problem.

I'll get working on it, and hopefully get as complete of a PR as possible for you all so it doesn't disrupt your schedule.

Mar 10 '25 16:03 tjburch

@hfrick and I were just discussing this and had a few thoughts.

First, a very similar (but not the same) solution would be to use case weights. We have importance weights that could be used to down-weight older data. Those get used in the model fit but not the model assessment. You could make a custom case weight class to do both.

That's not quite what you are asking for, but it is at least adjacent, so I thought I'd mention it.

Second, my thoughts about having a column in the rset might be a good solution. One thing that we were discussing is whether we would make the "special column" a documented convention or we would formalize it.

The latter would probably mean that it would be an argument to the rset object. By doing that, it is more easily identified and documented and the data would be more formally validated when the weights are passed to the rset.

Either way, a lot of the changes happen in estimate_tune_results().

In the short term, it might be prudent to look at what is in estimate_tune_results() and make a PR as long as you are ok with it lingering for a while or needing a lot of revision due to how we structure things.

If you need this feature in the short term, you could write a small function using collect_metrics() to get the individual resamples out and weight them accordingly. That doesn't help for things like racing or Bayesian optimization, but it would be a good solution for a basic grid search.

Mar 11 '25 15:03 topepo

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

Nov 06 '25 00:11 github-actions[bot]