hls-foundation-os icon indicating copy to clipboard operation
hls-foundation-os copied to clipboard

Config for irrigation_scenes and custom SpatioTemporalDataset loader

Open weiji14 opened this issue 1 year ago • 1 comments

A mmsegmentation configuration file for the irrigation_scenes dataset on https://huggingface.co/datasets/ibm-nasa-geospatial/hls_irrigation_scenes.

As this is a time-series dataset with data from four months stored in four different folders, a custom SpatioTemporalDataset class (subclassed from GeospatialDataset) and LoadSpatioTemporalImagesFromFile class (subclassed from LoadGeospatialImageFromFile) was created to perform the data loading. Training with only the first 3 months (June, July, August) for now. Also updated the fine-tuning-examples/README.md to mention how to run the irrigation_scenes setup.

Xref original work at https://github.com/NASA-IMPACT/hls-foundation/pull/30 and https://github.com/NASA-IMPACT/hls-foundation/pull/35

P.S. This is the same branch as #4, but that one got closed somehow during the private->public conversion of the repo.

weiji14 avatar Aug 03 '23 04:08 weiji14

Getting a TypeError: imgs must be a list, but got <class 'torch.Tensor'> on the validation stage in the forward_test function:

2023-08-03 17:03:11,629 - mmseg - INFO - workflow: [('train', 1)], max: 5000 iters
2023-08-03 17:03:11,629 - mmseg - INFO - Checkpoints will be saved to finetune_weights/irrigation_scenes/test_1/test_1 by HardDiskBackend.
2023-08-03 17:03:22,052 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-08-03 17:03:58,935 - mmseg - INFO - Iter [20/5000]	lr: 1.893e-07, eta: 3:15:17, time: 2.353, data_time: 0.046, memory: 6031, decode.loss_ce: 3.3195, decode.acc_seg: 6.3938, aux.loss_ce: 3.4039, aux.acc_seg: 1.7806, loss: 6.7234
[                                                  ] 0/281, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/.mim/tools/train.py", line 242, in <module>
    main()
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/.mim/tools/train.py", line 231, in main
    train_segmentor(
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train
    self.call_hook('after_train_iter')
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/hooks/evaluation.py", line 262, in after_train_iter
    self._do_evaluate(runner)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/core/evaluation/eval_hooks.py", line 117, in _do_evaluate
    results = multi_gpu_test(
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/apis/test.py", line 208, in multi_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/models/segmentors/base.py", line 110, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/models/segmentors/base.py", line 74, in forward_test
    raise TypeError(f'{name} must be a list, but got '
TypeError: imgs must be a list, but got <class 'torch.Tensor'>

This is the same one reported before at https://github.com/NASA-IMPACT/hls-foundation/pull/30#issuecomment-1603652525, which was fixed with some hacky workarounds to modify the default collate function in mmsegmentation's code here:

https://github.com/NASA-IMPACT/hls-foundation/blob/35edfb54057b18d2840b0e674277248797208b6f/mmsegmentation/mmseg/models/segmentors/base.py#L72-L75

Doesn't look possible to apply the same old workaround here anymore, so would need to find a different solution. Xref upstream issue at https://github.com/open-mmlab/mmsegmentation/issues/2410

weiji14 avatar Aug 03 '23 05:08 weiji14