cesi icon indicating copy to clipboard operation
cesi copied to clipboard

Why representative of cluster using ent2freq and NOT sub2freq dict?

Open sm354 opened this issue 2 years ago • 0 comments

I noticed that when at this line the subject embeddings and relation embeddings are passed for clustering, and then the cluster representative is found using (possibly) wrong ent2freq dictionary here. The subject embeddings dict contains 11878 subjects, whereas the ent2freq dict contains 23219 entities. The ent2freq dict maps from entity, and not subject, to its frequency i.e. there is a mismatch in entity id and subject id. Could you please clarify this? I am happy to elaborate my concern if needed.

sm354 avatar Mar 19 '22 07:03 sm354