Multimodal-Learning-with-Alternating-Unimodal-Adaptation icon indicating copy to clipboard operation
Multimodal-Learning-with-Alternating-Unimodal-Adaptation copied to clipboard

Problem on calculating the entropy

Open hubaak opened this issue 11 months ago • 4 comments

Thanks for your exciting work in dealing with the problem of multimodal imbalance. However, I met some troubles when running the code. In main.py, the code for calculating entropy is shown below:

def calculate_entropy(output):
    probabilities = F.softmax(output, dim=0)
    # probabilities = F.softmax(output, dim=1)
    log_probabilities = torch.log(probabilities)
    entropy = -torch.sum(probabilities * log_probabilities)
    return entropy

The size of the output is [B, N] (B for batch and N for the numbers of categories). According to the Equation (8) in the paper, the correct code to get entropy should be:

def calculate_entropy(output):
    # probabilities = F.softmax(output, dim=0)
    probabilities = F.softmax(output, dim=1) # Softmax through categories
    log_probabilities = torch.log(probabilities)
    entropy = -torch.sum(probabilities * log_probabilities, dim=-1) # If you don't add "dim=-1" then you sum up the entropy in the whole batch
    return entropy

What makes a difference here is entropy should be calculated for each sample, but not be summed up in a batch.

Oh! By the way, I think Equation (8) looks a little bit odd as 'm' shows up on the left side of the equation and serves as indices of the 'max' operation.

Equation 8 in the paper

hubaak avatar Mar 20 '24 07:03 hubaak

I think you're right. When I run -- dynamic, the effect is even worse than the fixed fusion method.

LittlePoolSpirit avatar Mar 20 '24 13:03 LittlePoolSpirit

Hi @hubaak, yeah, it seems we may have some mistakes on entropy calculation, we will do some carefully check on it and make some correction.

In addition, thank you for your kind suggestion for the eq.8. In this eq, we want to use "m" to represent modality, however I think you are right, it looks a bit ambiguous, I will discuss this will my Prof and make any essential correction in the camera-ready paper.

Cecile-hi avatar Mar 20 '24 13:03 Cecile-hi

Hi @LittlePoolSpirit, I think decreasing the eval batch szie may make some effect (e.g. --batch_size 1). For example, using the released ckpt With Fixed weight:

python main.py --ckpt_path best_model_of_dataset_CREMAD_Normal_alpha_0.3_optimizer_sgd_modulate_starts_0_ends_50_epoch_91_acc_0.7768817204301075.pth --gs_flag --dataset CREMAD --batch_size 1 --lorb base --av_alpha 0.55

We have: Accuracy: 0.7768817204301075, accuracy_a: 0.5981182795698925, accuracy_v: 0.668010752688172

--dyanmic (with the modification by @hubaak )

python main.py --ckpt_path best_model_of_dataset_CREMAD_Normal_alpha_0.3_optimizer_sgd_modulate_starts_0_ends_50_epoch_91_acc_0.7768817204301075.pth --gs_flag --dataset CREMAD --batch_size 1 --lorb base --dynamic

We have: Accuracy: 0.7876344086021505, accuracy_a: 0.5981182795698925, accuracy_v: 0.668010752688172

Cecile-hi avatar Mar 20 '24 13:03 Cecile-hi

Good idea, I met the same problem, too.

thinking024 avatar Apr 23 '24 12:04 thinking024

i just find that the eq.8 can be simplifed as follow: image it just may be a softmax as follow: image

ggamaz avatar Oct 18 '24 03:10 ggamaz

i just find that the eq.8 can be simplifed as follow: image it just may be a softmax as follow: image

Yeah, they simply weight the outputs of models with their minus entropy after softmax.

hubaak avatar Oct 21 '24 12:10 hubaak