Multimodal-Learning-with-Alternating-Unimodal-Adaptation
Multimodal-Learning-with-Alternating-Unimodal-Adaptation copied to clipboard
Problem on calculating the entropy
Thanks for your exciting work in dealing with the problem of multimodal imbalance. However, I met some troubles when running the code.
In main.py
, the code for calculating entropy is shown below:
def calculate_entropy(output):
probabilities = F.softmax(output, dim=0)
# probabilities = F.softmax(output, dim=1)
log_probabilities = torch.log(probabilities)
entropy = -torch.sum(probabilities * log_probabilities)
return entropy
The size of the output is [B, N] (B for batch and N for the numbers of categories). According to the Equation (8) in the paper, the correct code to get entropy should be:
def calculate_entropy(output):
# probabilities = F.softmax(output, dim=0)
probabilities = F.softmax(output, dim=1) # Softmax through categories
log_probabilities = torch.log(probabilities)
entropy = -torch.sum(probabilities * log_probabilities, dim=-1) # If you don't add "dim=-1" then you sum up the entropy in the whole batch
return entropy
What makes a difference here is entropy should be calculated for each sample, but not be summed up in a batch.
Oh! By the way, I think Equation (8) looks a little bit odd as 'm' shows up on the left side of the equation and serves as indices of the 'max' operation.
I think you're right. When I run -- dynamic, the effect is even worse than the fixed fusion method.
Hi @hubaak, yeah, it seems we may have some mistakes on entropy calculation, we will do some carefully check on it and make some correction.
In addition, thank you for your kind suggestion for the eq.8. In this eq, we want to use "m" to represent modality, however I think you are right, it looks a bit ambiguous, I will discuss this will my Prof and make any essential correction in the camera-ready paper.
Hi @LittlePoolSpirit, I think decreasing the eval batch szie may make some effect (e.g. --batch_size 1). For example, using the released ckpt With Fixed weight:
python main.py --ckpt_path best_model_of_dataset_CREMAD_Normal_alpha_0.3_optimizer_sgd_modulate_starts_0_ends_50_epoch_91_acc_0.7768817204301075.pth --gs_flag --dataset CREMAD --batch_size 1 --lorb base --av_alpha 0.55
We have: Accuracy: 0.7768817204301075, accuracy_a: 0.5981182795698925, accuracy_v: 0.668010752688172
--dyanmic (with the modification by @hubaak )
python main.py --ckpt_path best_model_of_dataset_CREMAD_Normal_alpha_0.3_optimizer_sgd_modulate_starts_0_ends_50_epoch_91_acc_0.7768817204301075.pth --gs_flag --dataset CREMAD --batch_size 1 --lorb base --dynamic
We have: Accuracy: 0.7876344086021505, accuracy_a: 0.5981182795698925, accuracy_v: 0.668010752688172
Good idea, I met the same problem, too.
i just find that the eq.8 can be simplifed as follow:
it just may be a softmax as follow:
i just find that the eq.8 can be simplifed as follow:
it just may be a softmax as follow:
Yeah, they simply weight the outputs of models with their minus entropy after softmax.