CWT-for-FSS icon indicating copy to clipboard operation
CWT-for-FSS copied to clipboard

Error while training the resnet model using pascal dataset

Open Hemanth-Gattu opened this issue 3 years ago • 11 comments

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/CWT-for-FSS/src/train.py", line 360, in <module>
    mp.spawn(main_worker, args=(world_size, args), nprocs=world_size, join=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/content/CWT-for-FSS/src/dataset/utils.py", line 91, in process_image
    assert label_class_ in list(range(1, 81)), label_class_
AssertionError: 147
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/content/CWT-for-FSS/src/train.py", line 117, in main_worker
    train_loader, train_sampler = get_train_loader(args)
  File "/content/CWT-for-FSS/src/dataset/dataset.py", line 44, in get_train_loader
    mode_train=True, transform=train_transform, class_list=class_list, args=args
  File "/content/CWT-for-FSS/src/dataset/dataset.py", line 114, in __init__
    self.data_list, self.sub_class_file_list = make_dataset(args.data_root, args.train_list, self.class_list)
  File "/content/CWT-for-FSS/src/dataset/utils.py", line 55, in make_dataset
    for sublist, subdict in mmap_(process_partial, tqdm(list_read)):
  File "/content/CWT-for-FSS/src/dataset/utils.py", line 17, in mmap_
    return Pool().map(fn, iter)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AssertionError: 147

Hemanth-Gattu avatar Jan 03 '22 12:01 Hemanth-Gattu

It seems there is something wrong with the label class. You can check the data loading process.

zhiheLu avatar Jan 18 '22 17:01 zhiheLu

  • Downloaded the pascal dataset and uploaded in drive
  • created a .txt file with the paths of JPEGImages with their corresponding SegmentationClass
The .txt file looks like this:

/..../VOCdevkit/VOC2012/JPEGImages/2007_000032.jpg /..../VOCdevkit/VOC2012/SegmentationClass/2007_000032.png

/..../VOCdevkit/VOC2012/JPEGImages/2007_000033.jpg /..../VOCdevkit/VOC2012/SegmentationClass/2007_000033.png

  • Updated the config files with corresponding paths

So can you please suggest what is going wrong in the data loading process

Hemanth-Gattu avatar Jan 20 '22 08:01 Hemanth-Gattu

You should download SegmentationClassAug here: https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0. The path list is already in the 'lists' folder.

zhiheLu avatar Jan 24 '22 16:01 zhiheLu

we have tried implementing the code with the data and information provided by you, but now we are facing an issue i.e We are getting an input box as shown below(present in last line) while running in google colab notebook. Can you please tell the reason behind this.

b055df53-c4b8-49c3-9680-d11938c0ed4d

We are getting the following error when we are training the model using GPU (vscode)

unnamed

So please suggest us a way to resolve these errors.

Hemanth-Gattu avatar Jan 25 '22 19:01 Hemanth-Gattu

What's your data_root path?

zhiheLu avatar Jan 31 '22 16:01 zhiheLu

Thanks for commenting....!

  • This is the path of the data root that we have changed in the config file and you can also see the detailed distribution of the directory in the folder section.

       Note: we have downloaded the VOC2012 dataset and using 'JPEG images' folder from that and for labels 
          we are using the  'SegmentationClassAug' folder that you have shared with us in the previous comment
    
image
  • And here we updated the path for resume weights
image

Hemanth-Gattu avatar Jan 31 '22 17:01 Hemanth-Gattu

It seems problem is in your list.

zhiheLu avatar Feb 01 '22 11:02 zhiheLu

Yes, it would be very helpful if you can share us the JPEG Images folder also.

Hemanth-Gattu avatar Feb 01 '22 11:02 Hemanth-Gattu

I use the exactly same JPEG images from official website. The bug is clearly not related to JPEG images.

zhiheLu avatar Feb 03 '22 14:02 zhiheLu

Hi Can you please guide me?

We are trying to train on PascalVOC dataset, within settings of dataset & lists according to the instruction you given: Running: sh scripts/train.sh pascal 0 [0] 50 1

but getting below error: label_class should have (1, 21) in case of pascal but I don't know why getting error on range(1, 81)), label_class.

Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/ahsan/test-project/simpler-is-better/src/dataset/utils.py", line 91, in process_image assert label_class_ in list(range(1, 81)), label_class_ AssertionError: None

muhammadahsan avatar May 15 '22 16:05 muhammadahsan

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/CWT-for-FSS/src/train.py", line 360, in mp.spawn(main_worker, args=(world_size, args), nprocs=world_size, join=True) File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/content/CWT-for-FSS/src/dataset/utils.py", line 91, in process_image assert label_class_ in list(range(1, 81)), label_class_ AssertionError: 147 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in wrap fn(i, *args) File "/content/CWT-for-FSS/src/train.py", line 117, in main_worker train_loader, train_sampler = get_train_loader(args) File "/content/CWT-for-FSS/src/dataset/dataset.py", line 44, in get_train_loader mode_train=True, transform=train_transform, class_list=class_list, args=args File "/content/CWT-for-FSS/src/dataset/dataset.py", line 114, in init self.data_list, self.sub_class_file_list = make_dataset(args.data_root, args.train_list, self.class_list) File "/content/CWT-for-FSS/src/dataset/utils.py", line 55, in make_dataset for sublist, subdict in mmap(process_partial, tqdm(list_read)): File "/content/CWT-for-FSS/src/dataset/utils.py", line 17, in mmap_ return Pool().map(fn, iter) File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value AssertionError: 147

Hi Have you solve the above issue? Can you please help me?

muhammadahsan avatar May 15 '22 16:05 muhammadahsan