DEKM clustering optimization iteration

Hi, I want to ask. Did you really used 140*100 = 14000 iteration in clustering optimization step? When applying a similar approach to my own dataset, which comprises over 30,000 texts, the iteration process did not converge to below a 0.1% n_change_assignment. Consequently, the entire algorithm required approximately 3 hours to complete, which, in my opinion, is quite extensive. Would you be able to provide some insights or clarification?

Thanks.

Mar 15 '24 18:03 corneliusjustin

oh and in the paper you mentioned that the minimum n_change_assignment to stop the iteration is 0.1%, but in the code you used 0.5%. So which one is valid?

Mar 15 '24 18:03 corneliusjustin

Do you really used 140*100 = 14000 iteration in clustering optimization step?

No, in all of my experiments, I halt at line 123 of DEKM.py. The default setting of '140*100 = 14000 iterations' is borrowed from DEC.

Optimal threshold

Both 0.5% and 0.1% result in comparable clustering performance. The optimal threshold might vary across different datasets. I update this threshold, but still employing the same one for all datasets.

Mar 15 '24 18:03 spdj2271

ahh I see, thanks!

Mar 15 '24 18:03 corneliusjustin

By the way, do you have a formula for the gradient of L_4 with respect to h? Since I'm a mathematics student and I want to use your DEKM method for my undergraduate thesis.

Also, I want to make sure is the y - y' resulted in a vector with the same dimension as y, but having zero values in all of the dimension except the last dimension? Since y' is a replicate of y, except the last dimension came from m_i right.

Thanks!

Apr 04 '24 13:04 corneliusjustin

the y - y' resulted in a vector with the same dimension as y, but having zero values in all of the dimension except the last dimension?

Yes, it is.

Apr 04 '24 13:04 spdj2271

I mean this L_4

Apr 04 '24 14:04 corneliusjustin

Because $\mathbf{y}^\prime$ is a scalar, thus we have $$\frac{\partial{L_4} }{\partial{\mathbf{h}} } =\frac{\partial{L_4} }{\partial{\mathbf{y}} }\frac{\partial{\mathbf{y}} }{\partial{\mathbf{h}} }=\sum_{i=1}^k \sum_{\mathbf{y} \in \mathcal{C}_i}2\mathbf{V}(\mathbf{y}-\mathbf{y}^\prime)$$

Apr 04 '24 14:04 spdj2271

Sorry but I don't get why y' is scalar? Isn't it's supposed to be $\frac{\partial{L_4}}{\partial{y}}\frac{\partial{y}}{\partial{h}} + \frac{\partial{L_4}}{\partial{y'}}\frac{\partial{y'}}{\partial{h}}$, because y' is a vector that depends on h just like y?

Apr 04 '24 14:04 corneliusjustin

In the DEKM context, we interpret $L_4=\sum_i \sum_y ||y-y'||^2$ as a regression task, with $y'$ representing a predetermined constant target value. The objective of regression here is to adjust $y$ to approximate $y'$.

Apr 04 '24 14:04 spdj2271

Ah I see thanks. By the way, can you explain the step by step how the equation at the top became the equation at the bottom? Thanks!

Apr 28 '24 08:04 corneliusjustin

This derivation uses two properties of the trace function: (1) Cyclic property $Tr(ABCD)=Tr(BCDA)$, and (2) trace additivity $\sum_i Tr(A_i)=Tr(\sum_i A_i)$.

Apr 28 '24 08:04 spdj2271

Sorry if I'm asking too much, but related to the Rayleigh-ritz theorem version that you mentioned in the paper, is it the version in the image? Since the most related theorem I found in the Handbook of Matrices book is this image, because $X$ and $A$ in the context of DEKM is e x e matrix, so $Tr(X^T A X) = \lambda_1 + \dots + \lambda_e$, where X is real and $X^T X=I$. Hence, $X=[v_1,\dots,v_e]$.

Apr 28 '24 18:04 corneliusjustin

is it the version in the image?

Yes, it is. Any question is welcom.

Apr 28 '24 18:04 spdj2271

I just realized, isn't the theorem says that each eigenvector is the column vector of $X$ (since it uses this notation: $X=[v_1,\dots,v_e]$), not the row vector? Because you stacked the eigenvector as row vectors for the orthonormal transformation matrix $V$

Apr 28 '24 18:04 corneliusjustin

In DEKM, we use $Tr(VS_wV^T)$, but not $Tr(V^TS_wV)$. Thus, $V$ consists of the row eigenvectors.

Apr 28 '24 18:04 spdj2271

Ohh I see, missed that part😄. Thank you so much!

Apr 28 '24 18:04 corneliusjustin

DEKM DEKM copied to clipboard

clustering optimization iteration

DEKM
DEKM copied to clipboard