FastSAM
FastSAM copied to clipboard
Problems during training
I encountered some issues when training with sa-1b
1.duplicate labels are removed
train: WARNING D:\code\cocostyle\train\images\sa_99946.jpg: 1 duplicate labels removed train: WARNING D:\code\cocostyle\train\images\sa_99948.jpg: 1 duplicate labels removed train: WARNING D:\code\cocostyle\train\images\sa_99977.jpg: 1 duplicate labels removed train: WARNING D:\code\cocostyle\train\images\sa_99980.jpg: 2 duplicate labels removed
2.An error occurred while saving labels. cache
`Traceback (most recent call last):
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\dataset.py", line 108, in get_labels
cache, exists = np.load(str(cache_path), allow_pickle=True).item(), True # load dict
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\code\\cocostyle\\train\\labels.cache'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Roaming\Ultralytics\DDP\_temp_isnrh7go2032266927360.py", line 9, in <module>
trainer.train()
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 192, in train
self._do_train(world_size)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 275, in _do_train
self._setup_train(world_size)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 239, in _setup_train
self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode='train')
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\v8\detect\train.py", line 54, in get_dataloader
dataset = self.build_dataset(dataset_path, mode, batch_size)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\v8\detect\train.py", line 28, in build_dataset
return build_yolo_dataset(self.args, img_path, batch, self.data, mode=mode, rect=mode == 'val', stride=gs)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\build.py", line 74, in build_yolo_dataset
return YOLODataset(
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\dataset.py", line 39, in __init__
super().__init__(*args, **kwargs)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\base.py", line 72, in __init__
self.labels = self.get_labels()
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\dataset.py", line 113, in get_labels
cache, exists = self.cache_labels(cache_path), False # run cache ops
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\data\dataset.py", line 94, in cache_labels
np.save(str(path), x) # save cache for next time
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\numpy\lib\npyio.py", line 546, in save
format.write_array(fid, arr, allow_pickle=allow_pickle,
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\numpy\lib\format.py", line 719, in write_array
pickle.dump(array, fp, protocol=3, **pickle_kwargs)
MemoryError
Traceback (most recent call last):
File "C:\Users\admin\AppData\Roaming\Ultralytics\DDP\_temp_isnrh7go2032266927360.py", line 9, in <module>
trainer.train()
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 192, in train
self._do_train(world_size)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 275, in _do_train
self._setup_train(world_size)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 239, in _setup_train
self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode='train')
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\v8\detect\train.py", line 53, in get_dataloader
with torch_distributed_zero_first(rank): # init dataset *.cache only once if DDP
File "C:\Users\admin\.conda\envs\fastsam\lib\contextlib.py", line 119, in __enter__
return next(self.gen)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\ultralytics\yolo\utils\torch_utils.py", line 40, in torch_distributed_zero_first
dist.barrier(device_ids=[local_rank])
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\torch\distributed\c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
File "C:\Users\admin\.conda\envs\fastsam\lib\site-packages\torch\distributed\distributed_c10d.py", line 3703, in barrier
work.wait()
RuntimeError: [C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\third_party\gloo\gloo\transport\uv\unbound_buffer.cc:67] Timed out waiting 3600000ms for recv operation to complete
[2023-12-05 15:38:13,105] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 11676 closing signal CTRL_C_EVENT
[2023-12-05 15:38:43,140] torch.distributed.elastic.multiprocessing.api: [WARNING] Unable to shutdown process 11676 via Signals.CTRL_C_EVENT, forcefully exiting via Signals.CTRL_C_EVENT`
How to solve these two problems?