DeepClustering icon indicating copy to clipboard operation
DeepClustering copied to clipboard

Some questions about your code.

Open kwonmha opened this issue 7 years ago • 3 comments

Hello I'm M.H. Kwon, a graduate student from South Korea.

I read your paper 'autoencoder based data clustering' and found your github. And I got into some questions.

  1. The term related to clustering in the object function of your paper has minus sign. I think the term should be added with reconstruction error term.

  2. In your matlab code - CG_Cluster.m, there is no squaring(^2) code in objective function. Instead, you wrote as " f0 = -R_cluster/N*sum(sum( Hcen.*log(targetout))) ; f1 = -R_data/N*sum(sum( XX(:,1:end-1).*log(XXout) + (1-XX(:,1:end-1)).*log(1-XXout))); " Would you explain above functions? Did you use cross entropy for f1? I found out the meaning of each variables like R_cluster, HCen, etc. But can't understand why you used log, there is no squaring(^2), how f1 come from etc...

  3. Also I can't figure out this code : "Ix4 = Ix4(:,1:end-1)+R_cluster/N*Hcen.*(targetout-1);" I can understand this part : " Ix4 = Ix4(:,1:end-1)+R_cluster/N" But can't understand this part : " Hcen.*(targetout-1)" How did you get that term as a derivative of clustering objective function?

  4. I think this code "IO = R_data/N*(XXout-XX(:,1:end-1));"
    should be changed into this "IO = R_data/N*(XXout-XX(:,1:end-1)).XXout.(1-XXout);" Because XXout is the output of sigmoid which takes "-w7probs*w8" as input. How do you think about that?

I'll look forward to your answer.

Thank you.

kwonmha avatar May 16 '18 13:05 kwonmha

Hi Kwon, We are glad that you are interesting at our work. Here is our answers about your questions.

  1. You are right! "The term related to clustering in the object function of your paper has minus sign. I think the term should be added with reconstruction error term." Sure, we also find this mistake, and has revised in the journal version.

  2. When I preparing for releasing this code, we find that in the log mode is better than the L2 mode , which can better for easy training. Here, the f0 is refer to the clustering restrain which to force each data closer to its center. The f1 is the original data recostruction fuction in the basic work of Autoencoder[Hinton, 2006], you can refer to his great work for deep understanding.

  3. "Ix4 = Ix4(:,1:end-1)+R_cluster/NHcen.(targetout-1);" is for computing the gradients, the first one the the original "borrowed" from the autoencoder, wheras the second is from the gradients of f0. I can disscuss more via the email if you are still in puzzle.

  4. This one is can also refer to the original autoencoder for details.

I will keep in touch with you via GitHub or email.

developfeng avatar May 21 '18 03:05 developfeng

Thank you for answering via both e-mail and here including details. It looks that I should read Hinton's Autoencoder first to discuss with you. ::::: I briefly skimmed the paper and the key seems to be "cross-entropy". Thank you. ::::: I'm using your paper to cluster data in range (-1, 1) that requires tanh activation function instead of sigmoid. So it looks better to use original f1 and f0 to my case.

kwonmha avatar May 21 '18 07:05 kwonmha

"You are right! "The term related to clustering in the object function of your paper has minus sign. I think the term should be added with reconstruction error term." Sure, we also find this mistake, and has revised in the journal version."

Which minus is he referring to?

ml-and-ml avatar Jun 20 '18 16:06 ml-and-ml