UltraLight-VM-UNet icon indicating copy to clipboard operation
UltraLight-VM-UNet copied to clipboard

train error

Open skkkyup opened this issue 10 months ago • 4 comments

Excellent work. I installed the corresponding packages and attempted to train with a different dataset, resizing the images into 256-sized blocks, and then encountered the following issue. Could you please tell me if there is any problem here: #----------Creating logger----------# #----------GPU init----------# #----------Preparing dataset----------# #----------Prepareing Models----------# SC_Att_Bridge was used #----------Prepareing loss, opt, sch and amp----------# #----------Set other params----------# #----------Training----------# torch.Size([8, 3, 256, 256]) x: torch.Size([8, 24, 32, 32]) x1: torch.Size([8, 1024, 6]) Traceback (most recent call last): File "train.py", line 189, in main(config) File "train.py", line 132, in main train_one_epoch( File "/root/UltraLight-VM-UNet-main/engine.py", line 39, in train_one_epoch out = model(images) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 183, in forward return self.module(*inputs[0], **module_kwargs[0]) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/UltraLight-VM-UNet-main/models/UltraLight_VM_UNet.py", line 221, in forward out = F.gelu(F.max_pool2d(self.ebn4(self.encoder4(out)),2,2)) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/UltraLight-VM-UNet-main/models/UltraLight_VM_UNet.py", line 38, in forward x_mamba1 = self.mamba(x1) + self.skip_scale * x1 File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/mamba_ssm/modules/mamba_simple.py", line 146, in forward out = mamba_inner_fn( File "/root/miniconda3/lib/python3.8/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 317, in mamba_inner_fn return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/root/miniconda3/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd return fwd(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 187, in forward conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd( TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported: 1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: bool) -> torch.Tensor

Invoked with: tensor([[[-3.6921e-02, -2.2584e-02, -2.5043e-02, ..., -5.4228e-02, -5.4027e-02, -5.1787e-02], [-6.3227e-02, -3.2376e-02, -3.7078e-02, ..., -8.5041e-02, -8.5496e-02, -8.1049e-02], [-4.1870e-03, -8.7852e-03, -8.0635e-03, ..., -4.1606e-02, -4.0918e-02, -4.1173e-02], ...,

     [ 1.3810e-01,  1.2920e-01,  1.2920e-01,  ...,  1.4433e-01,
       1.4433e-01,  1.6126e-01],
     [ 1.4266e-02,  1.4064e-02,  1.4064e-02,  ...,  1.1132e-02,
       1.1132e-02,  9.8532e-03],
     [-7.0302e-02, -6.6612e-02, -6.6612e-02,  ..., -7.2660e-02,
      -7.2660e-02, -7.9289e-02]]], device='cuda:0', requires_grad=True), tensor([[-0.2176, -0.1239, -0.0767, -0.1056],
    [-0.0190, -0.1246,  0.3363, -0.1871],
    [ 0.1523, -0.0473,  0.0405, -0.5286],
    [ 0.0705, -0.1187,  0.0597,  0.0934],
    [-0.2788,  0.0680, -0.1250, -0.1106],
    [ 0.3183, -0.1641,  0.3027,  0.0206],
    [ 0.3087, -0.3258, -0.2065,  0.2467],
    [ 0.3808,  0.1227, -0.1961, -0.4432],
    [-0.4132, -0.0891, -0.0532,  0.0154],
    [ 0.0185, -0.1335, -0.2039,  0.0383],
    [ 0.1164,  0.0900, -0.0019, -0.1997],
    [ 0.0422, -0.3562, -0.0239, -0.1291]], device='cuda:0',
   requires_grad=True), Parameter containing:

tensor([-0.0774, 0.0933, 0.1647, -0.1945, 0.3946, -0.0037, 0.0410, -0.4760, 0.0619, 0.1716, 0.0697, -0.0496], device='cuda:0', requires_grad=True), None, None, None, True

skkkyup avatar Apr 21 '24 07:04 skkkyup

Hi, based on your error message and past questions asked, this is mostly an issue with data preparation, resulting in the data not being entered correctly into the '.npy' file. We recommend that you preprocess your data according to the 'Prepare your own dataset' section. Alternatively, it is recommended that you first try to reproduce it on the ISIC2017 dataset (2000 images), which will allow you to rule out whether it is a matching issue with the environment and hardware.

wurenkai avatar Apr 21 '24 12:04 wurenkai

Thank you for your feedback. The issue was resolved after I reinstalled mamba_ssm==1.0.1; the previous version was 1.2.0. I have modified the loader.py to read a different dataset according to the task, but I am currently somewhat puzzled because, after training for multiple epochs on my task, the loss hardly decreases.

skkkyup avatar Apr 21 '24 13:04 skkkyup

Try to check the output and the DSC of the final result in the 'Output' folder. Also, try to check if the masks of the modified loader.py final output model are normalized.

wurenkai avatar Apr 21 '24 14:04 wurenkai

Thank you for your feedback. The issue was resolved after I reinstalled mamba_ssm==1.0.1; the previous version was 1.2.0. I have modified the loader.py to read a different dataset according to the task, but I am currently somewhat puzzled because, after training for multiple epochs on my task, the loss hardly decreases.

Hello, have you solved the problem of loss not decreasing? If you use a custom dataset, how should you process the data to achieve the purpose of training?

GZ-YourZY avatar Jun 14 '24 09:06 GZ-YourZY