datatrove
datatrove copied to clipboard
Exploring documents in different clusters
I see that there is an option to save the cluster ids. But, when I read *.clusters file for the original documents file, I see that the number of documents do not match. I see lesser number of documents. For each document, I want the cluster id so that I can view the documents that fall into the same cluster.