DEKM icon indicating copy to clipboard operation
DEKM copied to clipboard

clustering optimization iteration

Open corneliusjustin opened this issue 11 months ago • 16 comments

Hi, I want to ask. Did you really used 140*100 = 14000 iteration in clustering optimization step? When applying a similar approach to my own dataset, which comprises over 30,000 texts, the iteration process did not converge to below a 0.1% n_change_assignment. Consequently, the entire algorithm required approximately 3 hours to complete, which, in my opinion, is quite extensive. Would you be able to provide some insights or clarification?

Thanks.

corneliusjustin avatar Mar 15 '24 18:03 corneliusjustin

oh and in the paper you mentioned that the minimum n_change_assignment to stop the iteration is 0.1%, but in the code you used 0.5%. So which one is valid?

corneliusjustin avatar Mar 15 '24 18:03 corneliusjustin

Do you really used 140*100 = 14000 iteration in clustering optimization step?

No, in all of my experiments, I halt at line 123 of DEKM.py. The default setting of '140*100 = 14000 iterations' is borrowed from DEC.

Optimal threshold

Both 0.5% and 0.1% result in comparable clustering performance. The optimal threshold might vary across different datasets. I update this threshold, but still employing the same one for all datasets.

spdj2271 avatar Mar 15 '24 18:03 spdj2271

ahh I see, thanks!

corneliusjustin avatar Mar 15 '24 18:03 corneliusjustin

By the way, do you have a formula for the gradient of L_4 with respect to h? Since I'm a mathematics student and I want to use your DEKM method for my undergraduate thesis.

Also, I want to make sure is the y - y' resulted in a vector with the same dimension as y, but having zero values in all of the dimension except the last dimension? Since y' is a replicate of y, except the last dimension came from m_i right.

Thanks!

corneliusjustin avatar Apr 04 '24 13:04 corneliusjustin

the y - y' resulted in a vector with the same dimension as y, but having zero values in all of the dimension except the last dimension?

Yes, it is.

spdj2271 avatar Apr 04 '24 13:04 spdj2271

image I mean this L_4

corneliusjustin avatar Apr 04 '24 14:04 corneliusjustin

Because $\mathbf{y}^\prime$ is a scalar, thus we have $$\frac{\partial{L_4} }{\partial{\mathbf{h}} } =\frac{\partial{L_4} }{\partial{\mathbf{y}} }\frac{\partial{\mathbf{y}} }{\partial{\mathbf{h}} }=\sum_{i=1}^k \sum_{\mathbf{y} \in \mathcal{C}_i}2\mathbf{V}(\mathbf{y}-\mathbf{y}^\prime)$$

spdj2271 avatar Apr 04 '24 14:04 spdj2271

Sorry but I don't get why y' is scalar? Isn't it's supposed to be $\frac{\partial{L_4}}{\partial{y}}\frac{\partial{y}}{\partial{h}} + \frac{\partial{L_4}}{\partial{y'}}\frac{\partial{y'}}{\partial{h}}$, because y' is a vector that depends on h just like y?

corneliusjustin avatar Apr 04 '24 14:04 corneliusjustin

In the DEKM context, we interpret $L_4=\sum_i \sum_y ||y-y'||^2$ as a regression task, with $y'$ representing a predetermined constant target value. The objective of regression here is to adjust $y$ to approximate $y'$.

spdj2271 avatar Apr 04 '24 14:04 spdj2271

Ah I see thanks. By the way, can you explain the step by step how the equation at the top became the equation at the bottom? Thanks! image

corneliusjustin avatar Apr 28 '24 08:04 corneliusjustin

This derivation uses two properties of the trace function: (1) Cyclic property $Tr(ABCD)=Tr(BCDA)$, and (2) trace additivity $\sum_i Tr(A_i)=Tr(\sum_i A_i)$.

spdj2271 avatar Apr 28 '24 08:04 spdj2271

Sorry if I'm asking too much, but related to the Rayleigh-ritz theorem version that you mentioned in the paper, is it the version in the image? Since the most related theorem I found in the Handbook of Matrices book is this image, because $X$ and $A$ in the context of DEKM is e x e matrix, so $Tr(X^T A X) = \lambda_1 + \dots + \lambda_e$, where X is real and $X^T X=I$. Hence, $X=[v_1,\dots,v_e]$. image

corneliusjustin avatar Apr 28 '24 18:04 corneliusjustin

is it the version in the image?

Yes, it is. Any question is welcom.

spdj2271 avatar Apr 28 '24 18:04 spdj2271

I just realized, isn't the theorem says that each eigenvector is the column vector of $X$ (since it uses this notation: $X=[v_1,\dots,v_e]$), not the row vector? Because you stacked the eigenvector as row vectors for the orthonormal transformation matrix $V$

corneliusjustin avatar Apr 28 '24 18:04 corneliusjustin

In DEKM, we use $Tr(VS_wV^T)$, but not $Tr(V^TS_wV)$. Thus, $V$ consists of the row eigenvectors.

spdj2271 avatar Apr 28 '24 18:04 spdj2271

Ohh I see, missed that part😄. Thank you so much!

corneliusjustin avatar Apr 28 '24 18:04 corneliusjustin