dsmil-wsi
dsmil-wsi copied to clipboard
How to deal with multi-label problem?
Some cancer may have different parts in one slide because of tumor heterogeneity. Does this code solve the multi-label problem? Or how to deal with multi-label problem by using MIL?
The code works with multi-class labels. The labels need to be presented as distributed encoded binary vectors. For example, [0, 0, 1], [0, 1, 0], [1, 0, 0] each encodes one of the three classes. The max-pooling branch will pool the instances along with each digit of the class vector, the attentions are computed separately for each class, and the resulted bag representation will have a number of entries equal to the number of classes. This bag representation is then projected by a 1D convolution. Please check the example for TCGA lung cancer dataset.
Thanks for your answer. I still have some questions. There are different types of patches in a slide, and we choose the highest-rank type as the slide-level label. How does the code (as you say [0,0,1], [0,1,0], [1,0,0]) work? I still don't know how it works. Could you explain in detail? Thanks.
Thanks for your answer. I still have some questions. There are different types of patches in a slide, and we choose the highest-rank type as the slide-level label. How does the code (as you say [0,0,1], [0,1,0], [1,0,0]) work? I still don't know how it works. Could you explain in detail? Thanks.
For an example of three subtypes of cancer, the labels should be prepared as: [1, 0, 0] -- if the slide contains subtype 1 [0, 1, 0] -- if the slide contains subtype 2 [0, 0, 1] -- if the slide contains subtype 3 [1, 1, 0] -- if the slide contains both subtype 1 and subtype 2 ... [0, 0, 0] -- healthy slide
It might still work if the slide is labeled only according to the highest-rank type. For example, subtype 1 is higher-rank than subtype 2 such that a slide contains both subtype 1 and subtype 2 is labeled also as [1, 0, 0] (not [1, 1, 0]).
Thanks a lot. : )