EdgeSAM icon indicating copy to clipboard operation
EdgeSAM copied to clipboard

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco_13/trainval/00020596.jpg'

Open wowangle97 opened this issue 1 year ago • 6 comments

The problem shown in the title occurs after I run the code for preparing teacher embedding part. I use coco dataset, and have established folders for data preparation according to annotations and images, is there any problem? Thanks for help!

[2024-12-10 19:32:17 vit_h](save_embedding.py 56): INFO number of params: 637026048 [2024-12-10 19:32:17 vit_h](utils.py 60): INFO ==============> Resuming form weights/sam_vit_h_4b8939.pth.................... [2024-12-10 19:32:18 vit_h](utils.py 75): INFO <All keys matched successfully> [2024-12-10 19:32:19 vit_h](save_embedding.py 69): INFO Start saving embeddings Traceback (most recent call last): File "training/save_embedding.py", line 238, in main(config) File "training/save_embedding.py", line 79, in main save_embeddings_one_epoch(config, model, data_loader_train, epoch) File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "training/save_embedding.py", line 99, in save_embeddings_one_epoch for idx, ((samples, _), (keys, seeds)) in enumerate(data_loader): File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/work/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 31, in getitem return self.__getitem_for_write(index) File "/home/work/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 39, in __getitem_for_write item = self.dataset[index] File "/home/work/EdgeSAM/training/data/coco_dataset.py", line 98, in getitem img = Image.open(img_path).convert('RGB') File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/PIL/Image.py", line 3431, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/home/work/EdgeSAM/datasets/coco_13/trainval/00020596.jpg'

wowangle97 avatar Dec 10 '24 11:12 wowangle97

In addition, I wonder why I need the folder datasets/coco_13/trainval/, the data preparation stage did not say that I need to create a folder named trainval

wowangle97 avatar Dec 10 '24 11:12 wowangle97

Hello, I am also using a Coco format dataset and have not encountered the issue of not being able to find the graph in your dataset. Could you please check if your dataset is formatted incorrectly as datasets coco - (annotations/train2017/val2017)? Or maybe the DATASET has not been modified in YAML: coco,Your weight file also appears to have loaded incorrectly, and you need to use repvit instead of sam

But the errors in my place are the same as yours, ValueError: Caught ValueError in DataLoader worker process 0.

And do I have any further questions about distribution later on? Perhaps you have encountered it? I don't know if it's a version issue,Thanks for help!

[2024-12-11 05:38:30 rep_vit_m1_fuse_sa_distill](train.py 186): INFO Start training Traceback (most recent call last): File "/home/user/EdgeSAM/training/train.py", line 693, in main(args, config) File "/home/user/EdgeSAM/training/train.py", line 195, in main train_one_epoch_distill_using_saved_embeddings( File "/home/user/EdgeSAM/training/train.py", line 241, in train_one_epoch_distill_using_saved_embeddings for idx, ((samples, annos), (saved_embeddings, seeds)) in enumerate(data_loader): File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 32, in getitem return self.__getitem_for_read(index) File "/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 46, in __getitem_for_read with AugRandomContext(seed=seed): File "/home/user/EdgeSAM/training/data/augmentation/aug_random.py", line 14, in enter RNG = Generator(PCG64(seed=self.seed)) File "_pcg64.pyx", line 123, in numpy.random._pcg64.PCG64.init File "bit_generator.pyx", line 535, in numpy.random.bit_generator.BitGenerator.init File "bit_generator.pyx", line 315, in numpy.random.bit_generator.SeedSequence.init File "bit_generator.pyx", line 389, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy File "bit_generator.pyx", line 140, in numpy.random.bit_generator._coerce_to_uint32_array File "bit_generator.pyx", line 70, in numpy.random.bit_generator._int_to_uint32_array ValueError: expected non-negative integer

Batch 0: Samples shape before stack: [torch.Size([3, 256, 256])] Saved embeddings shape before stack: [(1048576,)] Samples shape after stack: torch.Size([1, 3, 256, 256]) Saved embeddings shape after reshape: torch.Size([1, 1048576]) WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4042320 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 4042321) of binary: /home/user/anaconda3/envs/edgesam/bin/python

gold123fish avatar Dec 11 '24 07:12 gold123fish

你好,我也在使用 Coco 格式的数据集,没有遇到你数据集中找不到图的问题。请问你的数据集格式是否错误,如数据集 coco - (annotations/train2017/val2017)?或者 YAML 中没有修改 DATASET:coco,你的权重文件似乎也加载不正确,需要使用 repvit 而不是 sam

但我这里报的错误和你的一样,ValueError: Caught ValueError in DataLoader worker process 0。

还有我后面还有什么关于发行版的问题吗?也许你也遇到过?不知道是不是版本问题,谢谢帮助!

[2024-12-11 05:38:30 rep_vit_m1_f​​use_sa_distill](train.py 186): INFO 开始训练 回溯(最近一次调用最后一次): 文件“/home/user/EdgeSAM/training/train.py”,第 693 行,在 main(args,config) 文件“/home/user/EdgeSAM/training/train.py”,第 195 行,在 main train_one_epoch_distill_using_saved_embeddings( 文件“/home/user/EdgeSAM/training/train.py”,第 241 行,在 train_one_epoch_distill_using_saved_embeddings 中 for idx, ((samples, annos), (saved_embeddings, seeds)) 在 enumerate(data_loader) 中: 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 634 行,在下一个 数据 = self._next_data() 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 1346 行,在 _next_data 中 返回 self._process_data(data) 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 1372 行,在 _process_data 中 data.reraise() 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/_utils.py”, 第 644 行,在 reraise 中 引发异常 ValueError:在 DataLoader 工作进程 0 中捕获 ValueError。 原始回溯(最近一次调用最后一次): 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py”,第 308 行,在 _worker_loop 中 数据 = fetcher.fetch(index) 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py​​”,第 51 行,在 fetch 数据中 = [self.dataset[idx] for idx in perhaps_batched_index] 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py​​”,第 51 行,在数据中 = [self.dataset[idx] for idx in perhaps_batched_index] 文件“/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py”,第 32 行,在getitem中 返回 self.__getitem_for_read(index) 文件“/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py”,第 46 行,在 __getitem_for_read 中, 使用 AugRandomContext(seed=seed): 文件“/home/user/EdgeSAM/training/data/augmentation/aug_random.py”,第 14 行,在输入 RNG = Generator(PCG64(seed=self.seed)) 文件“_pcg64.pyx”,第 123 行,在 numpy.random._pcg64.PCG64 中。init文件 “bit_generator.pyx”,第 535 行,在 numpy.random.bit_generator.BitGenerator 中。init 文件“bit_generator.pyx”,第 315 行,在 numpy.random.bit_generator.SeedSequence 中。init文件 “bit_generator.pyx”,第 389 行,在 numpy.random 中。bit_generator.SeedSequence.get_assembled_entropy 文件“bit_generator.pyx”,第 140 行,在 numpy.random.bit_generator._coerce_to_uint32_array 文件“bit_generator.pyx”,第 70 行,在 numpy.random.bit_generator._int_to_uint32_array ValueError:预期非负整数

批次 0: 堆叠前的样本形状:[torch.Size([3, 256, 256])] 堆叠前的已保存嵌入形状:[(1048576,)] 堆叠后的样本形状:torch.Size([1, 3, 256, 256]) 重塑后保存的嵌入形状:torch.Size([1, 1048576]) 警告:torch.distributed.elastic.multiprocessing.api:发送进程 4042320 关闭信号 SIGTERM 错误:torch.distributed.elastic.multiprocessing.api:失败(退出代码:1)local_rank:1(pid:4042321)二进制文件:/home/user/anaconda3/envs/edgesam/bin/python

Finally, I modified line 97 of /training/data/coco_dataset. It was changed to train/. Currently, it can be trained normally, but I encountered ZeroDivisionError: division by zero during the final evaluation

wowangle97 avatar Dec 11 '24 07:12 wowangle97

@gold123fish I don't have the same problem as you. I'm sorry. In addition, may I ask why I used the wrong weight file? Didn't the author say in the teacher Embed to download the weights/sam_vit_h_4b8939.pth? Why do you need to use repvit instead of sam, thank you

wowangle97 avatar Dec 11 '24 08:12 wowangle97

I noticed that I had previously modified the 98 line you mentioned. But it still shows that there is a problem with the distribution, and I still can't train. Regarding the weight file, I thought you had ended Teacher Embeddings and entered (Phase 1) Encoder Only Knowledge Distillation, which requires the use of repvit. I made a mistake

gold123fish avatar Dec 11 '24 08:12 gold123fish

我注意到我之前已经修改了你提到的 98 行。但仍然显示分布有问题,仍然无法训练。关于权重文件,我以为你已经结束了 Teacher Embeddings 并进入了(第一阶段)Encoder Only Knowledge Distillation,这需要使用 repvit。我犯了一个错误

You should try not to use distributed training, first on a GPU to see if it can run, first check whether it is an environment problem or cuda problem

wowangle97 avatar Dec 12 '24 01:12 wowangle97