opl
opl copied to clipboard
About the model
Such a great work! However, I found the model you use is vanilla resnet. The final block of resnet use ReLU as activation function, which results in all output features being non-negative. So the d is non-negative, which means that none of the features can be orthogonal. Can you explain why this model is used?Thank you so much!