KuroSiwo icon indicating copy to clipboard operation
KuroSiwo copied to clipboard

A problem of experiment about ConvLSTM model

Open RalphCBY opened this issue 10 months ago • 6 comments

Thank you for your outstanding work!

I try to run "python main.py --method=convlstm --dem=0 --slope=0 --batch_size=32",but here's where the question occurs:

Traceback (most recent call last): File "main.py", line 91, in train_recurrent_segmentation( File "/home5/cby/KuroSiwo-main/training/recurrent_trainer.py", line 143, in train_recurrent_segmentation output = model(inputs) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home5/cby/KuroSiwo-main/models/convlstm.py", line 184, in forward x = self.leakyrelu_1e(self.conv_1e(input)) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 458, in forward return self._conv_forward(input, self.weight, self.bias) File "/home3/cby/anaconda3/envs/sar_seg/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [16, 6, 3, 3], expected input[96, 2, 224, 224] to have 6 channels, but got 2 channels instead

How should I modify it, change the amount of input data (number of channels) or change the model?

Thank you for your help! Bingyu

RalphCBY avatar Feb 21 '25 01:02 RalphCBY

Hello,

Can you please provide the contents of your configs/train/data_config.json file?

paren8esis avatar Feb 21 '25 11:02 paren8esis

Hello,

Can you please provide the contents of your configs/train/data_config.json file?

{ "track": "RandomEvents", "train_pickle": "pickle/grid_dict.pkl", "test_pickle": "pickle/grid_dict.pkl", "negative_pickle":"pickle/negatives_only.pkl", "inputs": ["pre_event_1","pre_event_2", "post_event"], "channels": ["vv","vh"], "water_percentage": "[0,100]", "data_augmentations":false, "clamp_input": 0.15, "scale_input": "normalize", "data_mean": [0.0953, 0.0264], "data_std": [0.0427, 0.0215], "dem_mean":67.0293, "dem_std":1765.0062, "dem":false, "slope":false, "slope_mean":2.9482, "slope_std":79.2493, "reverse_scaling":false, "uint8":false }

RalphCBY avatar Feb 22 '25 06:02 RalphCBY

Thank you for catching this. There was an error at the computation of the ConvLSTM input channels. It has been fixed by 67e859d.

Please also note that a major update of the repo is in progress which will fix bugs and align with our recent publication at NeurIPS '24. You can check it out on this branch (it will hopefully be merged within the next few days).

paren8esis avatar Feb 25 '25 09:02 paren8esis

Thank you for your help. I could run it successfully. However, a new problem has now arisen, as shown below:

(0) Train Loss: 1769.0631: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 93/93 [01:45<00:00, 1.13s/it] 0%| | 0/5 [00:00<?, ?it/s]/home5/cby/KuroSiwo-main/training/recurrent_trainer.py:381: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with torch.cuda.amp.autocast(enabled=False): Validation Loss: 25.3056: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.39s/it] Traceback (most recent call last): File "main.py", line 91, in train_recurrent_segmentation( File "/home5/cby/KuroSiwo-main/training/recurrent_trainer.py", line 326, in train_recurrent_segmentation val_acc, val_score, miou = eval_recurrent_segmentation(model, val_loader, ckpt_path, settype='Validation', configs=configs, model_configs=model_configs) File "/home5/cby/KuroSiwo-main/training/recurrent_trainer.py", line 570, in eval_recurrent_segmentation pre_event_1_wand = reverse_scale_img(pre_event_1_wand, pre1_scale_vars[0], pre1_scale_vars[1], configs) File "/home5/cby/KuroSiwo-main/utilities/utilities.py", line 137, in reverse_scale_img return Normalize(new_means[:, None, None], new_stds[:, None, None])(img) IndexError: too many indices for tensor of dimension 0

I don't know if it's just me, and I'm getting a very big loss with my training, could you please help me with this? Also very much looking forward to your work on version 2, can foresee it being excellent!

RalphCBY avatar Feb 26 '25 02:02 RalphCBY

It seems there was an issue with reverse image scaling. It must have been fixed now, please try again.

Concerning the loss, the value that appears in the progress bar is not the average but the cumulative loss, so it makes sense to be this large.

paren8esis avatar Feb 27 '25 11:02 paren8esis

Thanks for your help, but this one never decreases in loss and does not increase in mIOU when training with this model.

RalphCBY avatar Mar 24 '25 01:03 RalphCBY