H-vmunet icon indicating copy to clipboard operation
H-vmunet copied to clipboard

Comparison Experiment

Open Ystartff opened this issue 10 months ago • 7 comments

I found that running some of your comparison experiments revealed a few problems: Take running META_Unet as an example.May I ask what the problem is and if you can help?

../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [123,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [124,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [125,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [126,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [90,0,0], thread: [127,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "train.py", line 224, in main(config) File "train.py", line 167, in main train_one_epoch( File "/mnt/data/linda/yyf/H-vmunet-main/engine.py", line 41, in train_one_epoch loss.backward() File "/home/linda/anaconda3/envs/vmunet/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/linda/anaconda3/envs/vmunet/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = True torch.backends.cudnn.allow_tf32 = True data = torch.randn([8, 32, 256, 256], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(32, 1, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()

ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_FLOAT padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = true allow_tf32 = true input: TensorDescriptor 0x89e818a0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 8, 32, 256, 256, strideA = 2097152, 65536, 256, 1, output: TensorDescriptor 0x89efcbc0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 8, 1, 256, 256, strideA = 65536, 65536, 256, 1, weight: FilterDescriptor 0x7fa474042520 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 1, 32, 3, 3, output: 0x7fa517d00000 weight: 0x7fa5809fa400

Ystartff avatar Apr 25 '24 14:04 Ystartff

Hi, according to your error message, if it is a binary segmentation, you should check the output of META_UNet to see if the activation function of Sigmoid is added.

wurenkai avatar Apr 25 '24 15:04 wurenkai

Author your help is very effective for me, thank you for your help

Ystartff avatar Apr 25 '24 15:04 Ystartff

Hello, I'm here again and I noticed that your experiment is set up
print_interval = 20 val_interval = 30 save_interval = 100 I find that you save the best weighting results with the smallest loss, but I find that there are actually better weighting results during training, and your experiments will not take the highest values because of this. Did you modify these parameters to save the weights one by one to find the maximum value?

Ystartff avatar Apr 26 '24 15:04 Ystartff

Hi, we did not modify the above parameters. Our experiments were tested by taking the lowest loss obtained by performing val when training each epoch.

wurenkai avatar Apr 26 '24 17:04 wurenkai

Hi!author, your work is excellent, one question I have is why you use a standard convolution with a convolution kernel of 3 as a layer at the very beginning of the encoder and before predicting the resultant outputs

Ystartff avatar May 07 '24 09:05 Ystartff

Hi, in H-vmunet, this is to increase the initial number of channels, making it possible to have a sufficient number of channels for high-order interactions when using H-SS2D subsequently.

wurenkai avatar May 07 '24 10:05 wurenkai

Then in the last two layers of the final decode output

Ystartff avatar May 07 '24 12:05 Ystartff