homlr
homlr copied to clipboard
Figure 20.8 not working
The following code:
set.seed(123)
fviz_nbclust(
ames_1hot_scaled,
kmeans,
method = "wss",
k.max = 25,
verbose = FALSE
)
Returns:
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
My environment is: R version 4.0.5 (2021-03-31) factoextra 1.0.7 AmesHousing 0.0.4 caret 6.0.86 dplyr 1.0.5
I think the resaon is that in the scale
step produced NA
(I dont know why it does this)
before running this code,you run the following code
ames_1hot_scaled[,"Neighborhood.Hayden_Lake"] <- 0
then it will run well.
Apparently the ames data set was updated from v0.0.3 to v0.0.4 and the Neighborhood variable now contains a "Hayden_Lake" factor level but there are no observations for that neighborhood when using AmesHousing::make_ames()
(see last bullet in this NEWS.md file).
# Hayden_Lake shows up as a level
levels(ames_full[["Neighborhood"]])
## [1] "North_Ames" "College_Creek"
## [3] "Old_Town" "Edwards"
## [5] "Somerset" "Northridge_Heights"
## [7] "Gilbert" "Sawyer"
## [9] "Northwest_Ames" "Sawyer_West"
## [11] "Mitchell" "Brookside"
## [13] "Crawford" "Iowa_DOT_and_Rail_Road"
## [15] "Timberland" "Northridge"
## [17] "Stone_Brook" "South_and_West_of_Iowa_State_University"
## [19] "Clear_Creek" "Meadow_Village"
## [21] "Briardale" "Bloomington_Heights"
## [23] "Veenker" "Northpark_Villa"
## [25] "Blueste" "Greens"
## [27] "Green_Hills" "Landmark"
##[29] "Hayden_Lake"
# But there are no observations for that level
as_tibble(ames_1hot) %>%
select(Neighborhood.Hayden_Lake) %>%
distinct()
## # A tibble: 1 × 1
## Neighborhood.Hayden_Lake
## <dbl>
## 1 0
Consequently, when you one-hot encode that column you end up getting the Neighborhood.Hayden_Lake
column filled with zeros and then when you try to scale this you get NaN
s:
> as_tibble(ames_1hot_scaled) %>% select(Neighborhood.Hayden_Lake)
## # A tibble: 2,930 × 1
## Neighborhood.Hayden_Lake
## <dbl>
## 1 NaN
## 2 NaN
## 3 NaN
## 4 NaN
## 5 NaN
## 6 NaN
## 7 NaN
## 8 NaN
## 9 NaN
## 10 NaN
If we coerce this column to a character data type prior to one-hot encoding then it works as illustrated in the book:
ames_full <- AmesHousing::make_ames() %>%
mutate_if(str_detect(names(.), 'Qual|Cond|QC|Qu'), as.numeric) %>%
mutate_if(is.factor, as.character)
full_rank <- caret::dummyVars(Sale_Price ~ ., data = ames_full, fullRank = TRUE)
ames_1hot <- predict(full_rank, ames_full)
dim(ames_1hot_scaled)
## [1] 2930 240