D2ArmorPicker icon indicating copy to clipboard operation
D2ArmorPicker copied to clipboard

Dynamically build clusters

Open robojumper opened this issue 3 years ago • 2 comments

Instead of hardcoding the cluster count and centroids, this uses k-means clustering to build clusters from the selected items only.

This does not include the necessary lockfile (package_lock.json) update because my npm installation insisted on lockfile format 2.


Please consider adding a license to this project.

robojumper avatar Nov 20 '21 15:11 robojumper

Hi, did you test the accuracy, spread and significance of the generated clusters? I pre-calculated the clusters using over 5000 armor pieces to gain the most accurate results (I could do the same now with 2 million armor pieces), hence the hard-coded clusters.

Mijago avatar Nov 20 '21 15:11 Mijago

Do you have a documented way to obtain these stats to compare? I suspect we have different goals for the clustering feature and the pre-generated clusters simply do something different from what I want to achieve. I'm fine if this doesn't end up getting merged and would love to know more details about what you want the clustering feature to be.

In any case, what prompted this change was that the items in my vault are very biased towards particular stats (and stat totals! the pre-generated clusters have totals means of 57-63 and mine have 63-65) and a large number of centroids simply don't contribute at all or create clusters with very few items. E.g. of 72 legendary armor pieces on my warlock, the existing centroids create 10 clusters with 0 items, 7 clusters with 1 item, and 2 clusters with 2 items. This means 85% of my armor (61/72) is assigned to 25% of the buckets (6/25) and tells me that 11 items are close to a cluster of potentially existing armor that I categorically tend to not keep (but I kept them for a reason!), but removes a lot of detail in the 6 buckets and makes it difficult to find pieces I would consider similar enough to consider not worth keeping both of.

A weakness of this k-means clustering approach is the randomization -- the clusters between different runs can be very different. I dont know which algorithm the cluster pre-generation used and how you chose to deal with the randomness, if any.

robojumper avatar Nov 20 '21 16:11 robojumper