MultiObjectiveOptimization
MultiObjectiveOptimization copied to clipboard
Connection between fmp and task branches
May I ask what should be the correct dimension of grads? Suppose the output feature map size is 100, and the batch_size is 512, should the rep and grads be of size [512, 100]? And how do you deal with the results of batch_size in the following calculation? Do you average the batchsize results into single result? In my experiments, when I compute the _min_norm_2d, the input vec is like [[tensor1], [tensor2], [tensor3]], and each tensor is of size 512x100. Therefore, the issue occurs with dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0], RuntimeError: 1D tensors expected, got 2D
May I ask what should be the correct dimension of grads? Suppose the output feature map size is 100, and the batch_size is 512, should the rep and grads be of size [512, 100]? And how do you deal with the results of batch_size in the following calculation? Do you average the batchsize results into single result? In my experiments, when I compute the _min_norm_2d, the input vec is like [[tensor1], [tensor2], [tensor3]], and each tensor is of size 512x100. Therefore, the issue occurs with dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0], RuntimeError: 1D tensors expected, got 2D I've solved the above issue with the updated #25 , thanks.
Now I have a new question: the assumption in the paper is that, all the task branches use the same feature map from backbone model. What if each task only use partial of the feature map? Suppose we have 2 tasks, the feature map is 1x100. Task 1 takes [0:50], and task 2 takes [51:100], instead of the entire feature map. In this case, does your method still applicable?
Hi, when running the code under my setting, I also got the issue "RuntimeError: 1D tensors expected, got 2D". How I solve this issue is that I flatten the original tensor gradients and make it fit into the torch.dot function.
If what I am doing is correct, we are simply doing an element-wise matrix followed by sum() instead of torch.mul.
I also write the above comments in https://github.com/intel-isl/MultiObjectiveOptimization/pull/25#issue-420495980
Hope someone can clarify it.
@milliema I am not sure did I understand the question correctly, but, you can still use it.
I think your model corresponds to the same thing implicitly. You can think like the layer before the features are the shared encoder and the feature decomposition is the part of the task layers.
Even if my understanding is not correct, you can still use it but not the MGDA-UB but the MGDA version. You can check the condition about approximate_norm_solution
@allenjack I saw the PR, I need to look into it. I did not have time due to the NeurIPS deadline