python-bcubed
python-bcubed copied to clipboard
Bcubed problem
With increasing clusters, B-cubed measure accuracy decreases, while the purity measure and rand-index measure increase in the same data, What is the cause? Does B-cubed measure have a decreasing trend with increasing number of clusters?
I was playing with their ground-truth and output example, and I saw that if the output has just one item we got Pr, Re and Fs equal to 1. For example:
#ground-truth data (also called gold-standard data) ldict = { "item1": set(["gray", "black"]), "item2": set(["gray", "black"]), "item3": set(["gray"]), "item4": set(["black"]), "item5": set(["black"]), "item6": set(["dashed"]), "item7": set(["dashed"]), "item8": set(["fk"]), "item9": set(["dk"]), "item10": set(["dk"]), "item11": set(["dk"]), }
and
cdict = { "item1": set(["A", "B"]), # "item2": set(["A", "B"]), # "item3": set(["A"]), # "item4": set(["B"]), # "item5": set(["B"]), # "item6": set(["C"]), # "item7": set(["C"]), }
I read the paper, but I didn't understand the reason for it. So, I expected that it considers the number of items in the ground-truth.