inefficient in findBestK=TRUE

Open epurdom opened this issue 8 years ago • 0 comments

If user chooses findBestK=c(TRUE,FALSE) with a range of values, e.g. ks=4:15 we are extremely inefficient, since we run each for 4-15 with findBestK=FALSE, and then for findBestK=TRUE, we RERUN all of k=4-15 and find the best K. This is because everything is run on parallel without cross-talk.

Similarly, if findBestK=TRUE, we throw away k=4-15 and only save the best, which seems like a waste if we just calculated it...

Perhaps should make findBestK so that will calculate and save k=4-15, then post-process those to get best. I.e. in clusterMany, would internally also set findBestK=FALSE, then do findBestK clustering last with just a silhouette processing of the results.

Could also make slot to save silhouette so could easily plot later.

Sep 29 '17 08:09 epurdom