dolos icon indicating copy to clipboard operation
dolos copied to clipboard

Perform hierarchical clustering to improve performance

Open rien opened this issue 2 years ago • 0 comments

We now perform single-linkage clustering based on an automatically determined similarity threshold. However, changing this threshold results in a lot of stuff that needs to be calculated again (especially on the graph page). By performing hierarchical clustering once, we can calculate the clusters at different similarity thresholds in advance.

In addition, we can use this hierarchical clustering to automatically place the nodes of the plagiarism graph in the correct position. This would remove the need for the force simulations that are currently resource-heavy.

rien avatar Nov 09 '22 12:11 rien