python-bcubed icon indicating copy to clipboard operation
python-bcubed copied to clipboard

Bcubed problem

Open MinaHeidari opened this issue 6 years ago • 1 comments

With increasing clusters, B-cubed measure accuracy decreases, while the purity measure and rand-index measure increase in the same data, What is the cause? Does B-cubed measure have a decreasing trend with increasing number of clusters?

MinaHeidari avatar Apr 27 '19 19:04 MinaHeidari

I was playing with their ground-truth and output example, and I saw that if the output has just one item we got Pr, Re and Fs equal to 1. For example:

#ground-truth data (also called gold-standard data) ldict = { "item1": set(["gray", "black"]), "item2": set(["gray", "black"]), "item3": set(["gray"]), "item4": set(["black"]), "item5": set(["black"]), "item6": set(["dashed"]), "item7": set(["dashed"]), "item8": set(["fk"]), "item9": set(["dk"]), "item10": set(["dk"]), "item11": set(["dk"]), }

and

cdict = { "item1": set(["A", "B"]), # "item2": set(["A", "B"]), # "item3": set(["A"]), # "item4": set(["B"]), # "item5": set(["B"]), # "item6": set(["C"]), # "item7": set(["C"]), }

I read the paper, but I didn't understand the reason for it. So, I expected that it considers the number of items in the ground-truth.

Yuri-Nassar avatar Sep 20 '20 20:09 Yuri-Nassar