“All models failed” with spatial_block_cv() + tune_grid() — “arguments imply differing number of rows: 0, 1” error
Brief description of the problem
I am experiencing an error when using spatial_block_cv() from {spatialsample} together with {tidymodels}' tune_grid() to perform spatial cross-validation on my dataset. The same dataset and modeling approach works fine with a standard vfold_cv(), but fails in all folds with an error message:
Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1 Error in estimate_tune_results(): ! All models failed. Run show_notes(.Last.tune.result) for more information.
I have verified the following:
- No empty folds: I checked
analysis()andassessment()sets in each fold; they all have >0 rows and include both classes (0 and 1). - No recipe issues: I removed all recipe steps (including
step_corr()andstep_zv()), or even removed the recipe entirely, and the error persists. - Simple model: I tested a non-tunable
rand_forestmodel withfit_resamples()(i.e., no hyperparameter grid) and still see the same failure. - vfold_cv() works: If I switch from
spatial_block_cv()tovfold_cv(), the model + data run successfully throughtune_grid()orfit_resamples()with no errors. - Indices look correct: The number of rows in
analysis()andassessment()is consistent across folds, each includes humedal == 0 and humedal == 1, so no single-class or 0-row subsets. - Tried reducing folds / radius: For instance,
v=3orradius=50instead ofv=5andradius=100. The same error arises. - Tried removing geometry: I used a typical approach of reassigning splits with
make_splits()to remove geometry from each fold’s analysis/assessment sets, and forcedclass(...) <- class(folds_spatial). The error persists.
Repro steps and partial code
Below is a simplified version of my workflow:
library(tidymodels)
library(spatialsample)
library(sf)
library(terra)
library(dplyr)
library(purrr)
# Example: I have ~827 data points (presence/absence)
# plus 4 raster-based predictors: ndvi, mndwi, pendiente, ti
# My dataset is an sf object with geometry.
# 1) I create folds:
set.seed(1996)
folds_spatial <- spatial_block_cv(
data = my_data_sf, # ~827 points
v = 5,
radius = 100
)
# 2) Drop geometry in each split:
my_data_nogeo <- st_drop_geometry(my_data_sf)
folds_spatial_nogeo <- folds_spatial %>%
mutate(
splits = map(splits, function(s) {
i_ana <- s$in_id
i_ass <- s$out_id
rsample::make_splits(
x = list(analysis = i_ana, assessment = i_ass),
data = my_data_nogeo,
class= "spatial_block_split"
)
})
)
# Restore class
class(folds_spatial_nogeo) <- class(folds_spatial)
# 3) Model specification
rf_spec <- rand_forest(trees = 500) %>%
set_mode("classification") %>%
set_engine("ranger", probability = TRUE)
my_wf <- workflow() %>%
add_model(rf_spec)
# (Sometimes I add a recipe, or none.)
set.seed(1996)
res <- fit_resamples(
my_wf,
resamples = folds_spatial_nogeo
)
# -> Fails with:
# Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1
# All models failed.
I also tried tune_grid() with a small grid of mtry and min_n and got the same result (All models failed).
Observations / Diagnostics
- If I switch to vfold_cv(my_data_nogeo, v=5, strata=humedal), everything works.
- The dataset is not huge, but I do have enough rows in each fold (I double-checked with a for loop printing nrow(analysis(...)), nrow(assessment(...)) and the distribution of humedal).
- Reducing to v=2 or v=3, or changing radius from 100 to 50, did not help.
- Removing any recipe steps or hyperparameter tuning also did not help.
- My sessionInfo() is below.
Session Info
# Please see below:
sessionInfo()
# or sessioninfo::session_info()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Santiago
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mapview_2.11.2 spatialsample_0.6.0 vip_0.4.1 DALEX_2.4.3 GGally_2.2.1
[6] corrplot_0.95 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 ranger_0.17.0
[11] yardstick_1.3.1 workflowsets_1.1.0 workflows_1.1.4 tune_1.2.1 rsample_1.2.1
[16] recipes_1.1.0 parsnip_1.2.1 modeldata_1.4.0 infer_1.0.7 dials_1.3.0
[21] scales_1.3.0 broom_1.0.7 tidymodels_1.2.0 janitor_2.2.1 here_1.0.1
[26] terra_1.8-5 sf_1.0-19 lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[31] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[36] ggplot2_3.5.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] DBI_1.2.3 rlang_1.1.4 magrittr_2.0.3 snakecase_0.11.1 furrr_0.3.1
[6] e1071_1.7-16 compiler_4.4.1 png_0.1-8 vctrs_0.6.5 lhs_1.2.0
[11] fastmap_1.2.0 pkgconfig_2.0.3 backports_1.5.0 leafem_0.2.3 utf8_1.2.4
[16] prodlim_2024.06.25 tzdb_0.4.0 satellite_1.0.5 xfun_0.49 R6_2.5.1
[21] stringi_1.8.4 RColorBrewer_1.1-3 parallelly_1.41.0 rpart_4.1.23 Rcpp_1.0.13-1
[26] knitr_1.49 future.apply_1.11.3 base64enc_0.1-3 Matrix_1.7-0 splines_4.4.1
[31] nnet_7.3-19 timechange_0.3.0 tidyselect_1.2.1 rstudioapi_0.17.1 timeDate_4041.110
[36] codetools_0.2-20 listenv_0.9.1 lattice_0.22-6 plyr_1.8.9 withr_3.0.2
[41] evaluate_1.0.1 future_1.34.0 survival_3.6-4 ggstats_0.7.0 units_0.8-5
[46] proxy_0.4-27 pillar_1.10.0 rsconnect_1.3.3 KernSmooth_2.23-24 stats4_4.4.1
[51] generics_0.1.3 sp_2.1-4 rprojroot_2.0.4 hms_1.1.3 munsell_0.5.1
[56] globals_0.16.3 class_7.3-22 glue_1.8.0 tools_4.4.1 data.table_1.16.4
[61] gower_1.0.2 grid_4.4.1 crosstalk_1.2.1 ipred_0.9-15 colorspace_2.1-1
[66] raster_3.6-30 cli_3.6.3 DiceDesign_1.10 lava_1.8.0 gtable_0.3.6
[71] GPfit_1.0-8 digest_0.6.37 classInt_0.4-10 farver_2.1.2 htmlwidgets_1.6.4
[76] htmltools_0.5.8.1 leaflet_2.2.2 lifecycle_1.0.4 hardhat_1.4.0 MASS_7.3-60.2
Any guidance would be greatly appreciated! I suspect either:
- A subtle bug or mismatch in how spatial_block_cv() interacts with analysis()/assessment() inside tidymodels, or
- Some unknown configuration in my environment that leads to arguments imply differing number of rows: 0, 1.
Thank you for looking into this!
Thank you for the issue! I don't have access to your my_data_sf object. Are you able to reproduce this issue with a reprex (reproducible example), using publicly available or simulated data? A reprex will help me troubleshoot and fix your issue more quickly.🙂
That said, this looks like it may live more cozily in spatialsample rather than rules, so I will transfer this issue to that repository. The issues you're seeing may be due to the manual transformations you've labeled step 2).