parsnip icon indicating copy to clipboard operation
parsnip copied to clipboard

Internals bug? `will_make_matrix()` returns `False` when given a matrix

Open joeycouse opened this issue 3 years ago • 2 comments

Hello, I'm working on a PR to address #765. While doing so I ran into an issue with will_make_matrix(). Shouldn't the return value be True if y is a matrix or a vector?

When y is a numeric matrix it fails this check and converts it to a vector. This is stripping the colname of y whenever it is passed to function later down the line such as xgb_train()

https://github.com/tidymodels/parsnip/blob/f0810910d0b69141af12c66ae210ec0880f95323/R/convert_data.R#L326-L336

joeycouse avatar Aug 14 '22 23:08 joeycouse

I believe will_make_matrix may mean "is this a thing that's not a matrix that will be converted to one," and that first if (is.matrix) return(FALSE) line makes me think that may be the case.

Is that interpretation correct, @topepo?

simonpcouch avatar Aug 15 '22 17:08 simonpcouch

Hmm, based on this comment I would assume the intention is to return y as a matrix if it is possible. Right now there doesn't seem to be a case when y is return as a matrix, from the call to .convert_form_to_xy_fit()

https://github.com/tidymodels/parsnip/blob/f0810910d0b69141af12c66ae210ec0880f95323/R/convert_data.R#L120-L124

joeycouse avatar Aug 15 '22 18:08 joeycouse

@simonpcouch is right about the intent.

It was also designed for data frames so that's why there's no logic to handle vectors. We could do this but would have to know how to name that column (so it feels out of scope for this function).

@joeycouse can you give some context on where this has come up?

topepo avatar Aug 30 '22 13:08 topepo

@topepo this is comes up when there is a call to .covert_form_to_xy_fit() i.e. where the underlying model only has a x_y interface. The return value of y is always a vector/factor and doesn't carry a name. So when passed to the underlying modeling function there is no way to determine what the original column name of y was.


> reg_fit <-
+   boost_tree(trees = 10, mode = "regression") %>%
+   set_engine("xgboost", 
+              eval_metric = "mae", 
+              verbose = 1) %>%
+   fit(mpg ~ ., data = mtcars)

This is an issue because the call to xgb_train() has no name for y which makes it very difficult to extended the validation arg to accept a data frame (#771, #765, #760, #471) because there is no good way to determine what the outcome column is.

https://github.com/tidymodels/parsnip/blob/e1eb30a6f6704bd0a3cf61736ea1d26e1d8eb081/R/fit_helpers.R#L137-L155

joeycouse avatar Aug 30 '22 15:08 joeycouse

Given that we've confirmed that the function, as it's currently scoped, works as intended, and we've closed the PR that prompted the proposed extension in scope, I'm going to go ahead and close this issue.

simonpcouch avatar Aug 30 '22 18:08 simonpcouch

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Sep 14 '22 01:09 github-actions[bot]