Two implementation questions
TL;DR
- why did you implement the frobenius norm as a
reduce_mean(square(...))inproj_errormethod? - RSRAE vs RSRAE+ in code implementation
Thanks for providing the paper and code. I have a few questions about your implementation.
In paper, the loss function for RSRAE is
(Take p=1 in reconstruction err and q=1 in rsr error)
This part is implemented in your code like this:
https://github.com/dmzou/RSRAE/blob/d1f667c892ca42c987a3e503a2fc3f487c29e117/RSRAE/model.py#L219-L222
proj_error is implemented in the code like this:
https://github.com/dmzou/RSRAE/blob/d1f667c892ca42c987a3e503a2fc3f487c29e117/RSRAE/model.py#L173-L175
proj_error is denoted in paper,
... and ||.||_F denotes the Frobenius norm. ,...
question is, why did you implement the frobenius norm as a reduce_mean(square(...))? I'm wondering if it's correct to implement it with the code below.
def proj_error(self):
inner_term = tf.matmul(tf.transpose(self.A), self.A) - tf.eye(self.intrinsic_size)
return tf.norm(inner_term, p=2)
- reduce_mean vs frobenius mean
D = 10 # latent dimension
d = 3 # intrinsic dimension
A_tf = tf.random_normal((D, d))
inner_term = tf.matmul(tf.transpose(A_tf), A_tf) - tf.eye(d)
# your implementation
print(tf.reduce_mean(tf.square((inner_term))).numpy())
# frobenius norm
# is same to: ```python tf.sqrt(tf.reduce_sum(inner_term ** 2))```
print(tf.norm(inner_term, ord=2).numpy())
21.11717
13.786027
If i'm wrong, what is the difference between the term on the left side and the term on the right side of equation 4?
My second question is alternating minimization. I understand as below, is it correct?
- If
all_altparameter isTrue, do RSRAE algorithm (sequentially optimize) - Otherwise, do RSRAE+ algorithm (oll-in-one optimize as L_ae+L_pca+L_proj)
In Figure 2, RSRAE's score was better than RSRAE+'s score. Why does optimizing them sequentially improve performance? Also, are the advantages of RSRAE+ in terms of memory usage and speed improvement?
- I think using three optimizers will greatly increase memory usage and training/inference speed.
Thanks for reading.
Thank you for the questions.
I believe print(tf.norm(inner_term, ord=2).numpy()) should be print(tf.norm(inner_term, ord='fro').numpy())
Also, applying reduce_mean causes a difference in the scalar multiplier, absorbed by lambda_2. They should be equivalent.
It is possible to improve the performance of RSRAE+ by tuning lambda_1 and lambda_2, which may be difficult in practice. We thus recommended alternating minimization, which showed better performance in our experiments.