factoextra
factoextra copied to clipboard
Sampling of the rows of data is not uniform
In this function, k
is a vector of row indexes that represent the sample rows of the data. Currently:
k <- round(runif(n, 1, nrow(data)))
However, this does NOT use an equal probability to sample rows. For example:
table(round(runif(10000, 1, 10)))
# 1 2 3 4 5 6 7 8 9 10
# 532 1083 1138 1087 1116 1109 1111 1133 1132 559
The first and last rows of the data are only sampled half as often as the other rows of the data.
The proposed fix samples all rows with equal probability:
table(sample(1:10, 10000, replace=TRUE))
# 1 2 3 4 5 6 7 8 9 10
# 1032 975 1020 1021 962 1009 1064 949 962 1006