Medical-Transformer
Medical-Transformer copied to clipboard
Multi-label Segmentation
Hi, I have a ground-truth with 3 classes including background with values 0,127,255. As mentioned in #43 I changed num_classes=3 in axialnet.py In utils.py, https://github.com/jeya-maria-jose/Medical-Transformer/blob/703a080d66d16673be1b8770bca143956c9f0e8a/utils.py#L156-L157
which makes the ground-truth with values 0 and 1 but I should have 0,1 and 2 for my case(with 3 classes).
I tried doing this mask[mask<127] = 0 mask[mask==127] = 1 mask[mask>127] = 2
But I got this error. Could you please help me with this?
You should remove those lines with mask if you are converting it to multi-class problem. Your ground truth should just contain pixels of values 0,1,2,3 if you are working on a 3-class classification problem.
Hi, Thanks for your response. I did remove those 2 lines from my code. I have 3 classes/labels in total(including background) which have the values 0,1,2 in my ground-truth respectively. I also changed num_classes=3 in axialnet.py but when I run the code, I get this error. Does it have to do with the loss function? Do I need to change anything else? Could you please help me with this error?
Could you please explain what does these lines(189-192) do in train.py?
tmp[tmp>=0.5] = 1 tmp[tmp<0.5] = 0 tmp2[tmp2>0] = 1 tmp2[tmp2<=0] = 0
and also why you do this(205-206)? yHaT[yHaT==1] =255 yval[yval==1] =255
I have to remove these lines for my case, right?
请问这个你是用的哪个数据集
@shanpriya3, the code in lines 189-192 applies an aggressive softmax - i.e. translate all predictions into binary format (either 0 or 1) to then store the mask in the format described in the repository's Readme (values 255 correspond to the object, 0 to the background).
Lines 205-206 are needed for the mask saving format described in the repository:
- Based on the image, the model builds a response map:
y_out = model(X_batch)
on line 184; - The image is converted to numpy format, then it is assumed that the output of the model contains a probability map of whether a pixel belongs to objects, i.e.
[batch_size, channels, width, height]
are translated into[batch_size, num_classes, width, height]
(in this case, num_classes = 3), and each position of the result contains such a number from 0 to 1 that if you add by the number of classes (dim=1) result, then you get a map (batch maps) of identical units ([batch_size, num_classes, width, height].sum(dim=1) == 1*[batch_size, width, height]
- the description is formal, just to add interpretability), BUT:-
criterion = LogNLLLoss()
is used as a criterion - line 111, however, this criterion is described in themetrics.py
file on the 9th line and implements notLogNLLLoss
, butCrossEntropy
, that is, for predicting the modelmodel(input)
in thecriterion
object,softmax
is applied first, so there is no used (in_forward_impl
,forward
methods). - Then the result in the validation part before calling
tmp[tmp>=0.5] = 1
in line 189, you need to call Softmax-transformation in order to interpret the model prediction (raw data) as probabilities i.e. replacey_out = model(X_batch)
in line 184 with, for example,y_out = model.soft(model(X_batch))
ory_out = torch.nn.functional.softmax(model(X_batch), dim=1)
.
-
Then, as @jeya-maria-jose mentioned, instead of modifying the mask, you need to remove these lines and assume that gt(ground truth) should contain integer values of object classes (0, 1 or 2 in this case), also for a simpler interpretation, it is easier to save the predictions of the validation set not only for the 1st channel, i.e. maybe change line 214 to cv2.imwrite(fulldir+image_filename, yHaT[0,1:,:,:].transpose(1, 2, 0))
with optional zero-padding or keeping the background layer to avoid errors saving dual channel images. The resulting mask will have num_classes-1 (no background) layers, and each layer will contain 255 only if the corresponding object is detected by the model in this pixel (for example, in the first layer in the $(i, j)$ position there will be 255, which means in $(i, j)$ is an object of the 1st class, and if $(i, j)$ contains 255 in the second layer, then the object of the 2nd class is in this position).
Hello, I have a question, in training,output = model(X_batch)
contains a probability map of whether a pixel belongs to objects.Does that mean that the values in tensor are numbers from 0 to 1?
Secondly, in the training phase, do I need to process y_batch(in this case, num_classes = 20, pixels of values 0,1,2,3... 19), or directly calculate the loss of it and outputloss = criterion(output, y_batch)