Differences between models
Hello, I noticed that there are two options for arch in Step3_WSI_classification_ACMIL.py, ga and mha. But the reference uses ga, so I want to ask what the difference is between the two? Isn't the mha framework mentioned in the paper? Thank you.
GA use gate which comes from ABMIL , MHA comes from this paper 3.2 ,this is my understanding
GA use gate which comes from ABMIL , MHA comes from this paper 3.2 ,this is my understanding But the reference uses ga, so I feel a little confused.I wonder if it's just a different way of getting attention.
Gated Attention (GA) and Multi-Head Attention (MHA) represent two distinct attention formulations. Although MHA is not discussed in our main manuscript, we validated our ACMIL approach using MHA as well, with results available in Table 4 of the supplementary materials.