U-Net-Transformer icon indicating copy to clipboard operation
U-Net-Transformer copied to clipboard

I want to know serious bugs is what?

Open over-star opened this issue 4 years ago • 7 comments

I want to know serious bugs is what?

over-star avatar Jun 04 '21 07:06 over-star

About Postional Encoding, it will leads to network learning nothing, and I have changed the implementation code a few days ago, now it seems OK, but the result is still not good enough.

HXLH50K avatar Jun 04 '21 07:06 HXLH50K

Hi, I find your model works well...although there seems to be a bit of bug with the utils/train.py file (I don't know for sure because I haven't looked detailedly at it), but using it, the validation loss is much higher than the training loss even in the first epoch, and the model used does not seem to be trained. When I deconstructed it and ran it without that file, I get a relatively high dice of ~0.98 for the train & validation dataset after 5 epochs, for 508 training images from Carvana CARS dataset. What result were you looking for?

charleneolive avatar Jun 13 '21 07:06 charleneolive

It seems that your Multi-head cross attention has some slight implementation differences from the paper. You did not rescale the resulting weight values with a sigmoid activation function. Is there some reason for doing so?

charleneolive avatar Jun 20 '21 14:06 charleneolive

It seems that your Multi-head cross attention has some slight implementation differences from the paper. You did not rescale the resulting weight values with a sigmoid activation function. Is there some reason for doing so?

Thank you,it's a fault. But I think it is not the keypoint. I fix this and did an experiment,the result is also bad. In my own segmentation dataset called "X", Its valid dice only reaches 0.4, and as a comparison, unet++'s valid dice is 0.8.

By the way, because of the high memory consumption, I can only set BS=1 in a 16GB card. This may be the reason for the violent vibration of the training curveviolently.

HXLH50K avatar Jun 21 '21 06:06 HXLH50K

Oh hmm, did you try on the Carvana dataset? I saw that you had the the script to preprocess the Carvana dataset. The code you currently have works well for the Carvana dataset, with and without correction.

charleneolive avatar Jun 24 '21 02:06 charleneolive

I followed your current script and also used a batch size of 1. By the way, the sigmoid activation function is not the only implementation difference, but I am curious as to well you obtained poor results but this is not the case for me.

charleneolive avatar Jun 24 '21 02:06 charleneolive

It seems that your Multi-head cross attention has some slight implementation differences from the paper. You did not rescale the resulting weight values with a sigmoid activation function. Is there some reason for doing so?

Thank you,it's a fault. But I think it is not the keypoint. I fix this and did an experiment,the result is also bad. In my own segmentation dataset called "X", Its valid dice only reaches 0.4, and as a comparison, unet++'s valid dice is 0.8.

By the way, because of the high memory consumption, I can only set BS=1 in a 16GB card. This may be the reason for the violent vibration of the training curveviolently.

Sigmoid is preceded by a 11 convolution. I think this paper may learn from the method in "Attention gated networks: Learning to leverage salient regions in medical images". Change the number of channels from d to 1, otherwise the 11 convolution is useless.

Sherry-8 avatar Nov 17 '21 03:11 Sherry-8