mindyolo icon indicating copy to clipboard operation
mindyolo copied to clipboard

在windows cpu模式下训练自己的数据集,但是训练直接End

Open zgh2022 opened this issue 10 months ago • 1 comments

这是 2024-03-29 17:11:44,155 [INFO] Dataset Cache file hash/version check success. 2024-03-29 17:11:44,155 [INFO] Load dataset cache from [coco\val\val2017.cache.npy] success. 2024-03-29 17:11:44,155 [INFO] Dataloader num parallel workers: [1] 2024-03-29 17:11:44,254 [INFO] Registry(name=callback, total=4) 2024-03-29 17:11:44,254 [INFO] (0): YoloxSwitchTrain in mindyolo\utils\callback.py 2024-03-29 17:11:44,254 [INFO] (1): EvalWhileTrain in mindyolo\utils\callback.py 2024-03-29 17:11:44,254 [INFO] (2): SummaryCallback in mindyolo\utils\callback.py 2024-03-29 17:11:44,254 [INFO] (3): ProfilerCallback in mindyolo\utils\callback.py 2024-03-29 17:11:44,254 [INFO] 2024-03-29 17:11:44,256 [INFO] got 2 active callback as follows: 2024-03-29 17:11:44,256 [INFO] SummaryCallback() 2024-03-29 17:11:44,256 [INFO] EvalWhileTrain(stage_intervals=[1], stage_epochs=[9223372036854775807], stage_cum_epochs=[9223372036854775807], eval_last_epoch=True, isolated_epochs=[], keep_checkpoint_max=10, manager_best=<mindyolo.utils.checkpoint_manager.CheckpointManager object at 0x0000023318B0AF40>, ckpt_filelist_best=[]) 2024-03-29 17:11:44,256 [WARNING] log interval should be less than total steps of one epoch, but got 100 > 0, set log_interval as steps_per_epoch 0 2024-03-29 17:11:44,257 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :). albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success

[INFO] albumentations load success albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) [INFO] albumentations load success [WARNING] ME(26516:31084,MainProcess):2024-03-29-17:11:53.987.53 [mindspore\dataset\engine\iterators.py:155] No records available. 2024-03-29 17:11:53,462 [INFO] End Train. 2024-03-29 17:11:53,462 [INFO] Training completed. 我检查了数据集的加载部分,都是没问题的 训练log是如下: 2024-03-29 17:17:09,990 [INFO] 2024-03-29 17:17:10,002 [INFO] Please check the above information for the configurations 2024-03-29 17:17:10,175 [WARNING] Parse Model, args: nearest, keep str type 2024-03-29 17:17:10,204 [WARNING] Parse Model, args: nearest, keep str type 2024-03-29 17:17:10,300 [INFO] number of network params, total: 7.04407M, trainable: 7.025023M 2024-03-29 17:17:10,532 [WARNING] Parse Model, args: nearest, keep str type 2024-03-29 17:17:10,561 [WARNING] Parse Model, args: nearest, keep str type 2024-03-29 17:17:10,651 [INFO] number of network params, total: 7.04407M, trainable: 7.025023M 2024-03-29 17:17:10,833 [WARNING] Cannot find checkpoint parameter model.model.24.stride. 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.0.weight with shape (255, 128, 1, 1), which is inconsistent with cell shape (21, 128, 1, 1) 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.0.bias with shape (255,), which is inconsistent with cell shape (21,) 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.1.weight with shape (255, 256, 1, 1), which is inconsistent with cell shape (21, 256, 1, 1) 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.1.bias with shape (255,), which is inconsistent with cell shape (21,) 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.2.weight with shape (255, 512, 1, 1), which is inconsistent with cell shape (21, 512, 1, 1) 2024-03-29 17:17:10,833 [WARNING] Dropping checkpoint parameter model.model.24.m.2.bias with shape (255,), which is inconsistent with cell shape (21,) 2024-03-29 17:17:10,904 [INFO] Pretrain model load from "yolov5s_300e_mAP376-860bcf3b.ckpt" success. 2024-03-29 17:17:11,509 [INFO] ema_weight not exist, default pretrain weight is currently used. 2024-03-29 17:17:11,517 [INFO] Dataset Cache file hash/version check success. 2024-03-29 17:17:11,517 [INFO] Load dataset cache from [coco\train\train2017.cache.npy] success. 2024-03-29 17:17:11,518 [INFO] Dataloader num parallel workers: [4] 2024-03-29 17:17:11,541 [INFO] Dataset Cache file hash/version check success. 2024-03-29 17:17:11,541 [INFO] Load dataset cache from [coco\val\val2017.cache.npy] success. 2024-03-29 17:17:11,542 [INFO] Dataloader num parallel workers: [1] 2024-03-29 17:17:11,640 [INFO] 2024-03-29 17:17:11,646 [INFO] got 2 active callback as follows: 2024-03-29 17:17:11,646 [INFO] SummaryCallback() 2024-03-29 17:17:11,646 [INFO] EvalWhileTrain(stage_intervals=[1], stage_epochs=[9223372036854775807], stage_cum_epochs=[9223372036854775807], eval_last_epoch=True, isolated_epochs=[], keep_checkpoint_max=10, manager_best=<mindyolo.utils.checkpoint_manager.CheckpointManager object at 0x00000285A787BF40>, ckpt_filelist_best=[]) 2024-03-29 17:17:11,647 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :). 2024-03-29 17:17:21,559 [INFO] End Train. 2024-03-29 17:17:21,560 [INFO] Training completed.

zgh2022 avatar Mar 29 '24 09:03 zgh2022

mindyolo windows平台当前暂未支持,问题可能会比较多,可以尝试单独跑下遍历dataloader部分的代码,看下是不是数据集中不存在数据;

zhanghuiyao avatar Apr 07 '24 07:04 zhanghuiyao