rainette
rainette copied to clipboard
Select clusters containing more than a certain number of segments
Hi @juba,
After having done a Rainette clustering, I often execute a Correspondence Analysis with lexicon and clusters.
In that case, very small clusters tend to pull the plot to the extremes, making it difficult to read.
So I’m looking for a way to select and isolate the clusters that contain a very small number of segments.
In some way, I need to build a vector with the names of the clusters that contain less than a certain number of segments.
I have looked at the documentation available but have no idea of how to do it.
Can you help me and put me on the way?
Thanks a lot for your help!
Gabriel
This is not directly related to rainette. You have to compute the size of the clusters and filter out the smaller ones. Something like:
tab <- table(clusters)
names(tab)[tab > min_size]
Thanks a lot for your help and sorry to ask a question not directly related to rainette… 😰
Just a question: is there an object “clusters” that I can use to compute the size of each cluster? I can’t find it in the docs.
I had an idea of computing the size of each cluster with something like this:
clusters <- clusters_by_doc_table(dtm_for_analysis, clust_var = "Cluster")
sum(clusters$clust_1)
…
But, then I should have a loop to do it for each cluster in the clustering… and I think maybe there is something simpler?
Sorry if it’s an obvious question…
If you're looking for the size of each cluster in terms of number of segments, then doing the following should be enough:
clusters <- cutree(res, k = 5)
table(clusters)
Or in your example:
table(dtm_for_analysis$Cluster)
Much easier like this 😬. Thanks a lot for helping, this is exactly what I needed!