HOW TO: Training in Google Colab (Single T4) and "NotImplementedError"
Hello, I am trying to play around with what is here. Thank you for your efforts by the way!
- I tried to run the project in Google colab, cloned the repo installed requirements and ran inference.
- I got output which tells me I have Installed things properly
- I then prepare for training -> I followed folder structure and dataset format -> Went to custom_detection.yml and changed coco remap to false -> I also changed the parameters in custom_detection.yml as gleaned below:
task: detection
evaluator:
type: CocoEvaluator
iou_types: ['bbox', ]
num_classes: 3 # your dataset classes
remap_mscoco_category: False
train_dataloader:
type: DataLoader
dataset:
type: CocoDetection
img_folder: /content/drive/MyDrive/v9-v1_augmented.coco/images/train
ann_file: /content/drive/MyDrive/v9-v1_augmented.coco/annotations/instances_train.json
return_masks: False
transforms:
type: Compose
ops: ~
shuffle: True
num_workers: 4
drop_last: True
collate_fn:
type: BatchImageCollateFunction
val_dataloader:
type: DataLoader
dataset:
type: CocoDetection
img_folder: /content/drive/MyDrive/v9-v1_augmented.coco/images/val
ann_file: /content/drive/MyDrive/v9-v1_augmented.coco/annotations/instances_val.json
return_masks: False
transforms:
type: Compose
ops: ~
shuffle: False
num_workers: 4
drop_last: False
collate_fn:
type: BatchImageCollateFunction
And my dataloader.yml to (rduce batch size):
train_dataloader:
dataset:
transforms:
ops:
- {type: RandomPhotometricDistort, p: 0.5}
- {type: RandomZoomOut, fill: 0}
- {type: RandomIoUCrop, p: 0.8}
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: RandomHorizontalFlip}
- {type: Resize, size: [640, 640], }
- {type: SanitizeBoundingBoxes, min_size: 1}
- {type: ConvertPILImage, dtype: 'float32', scale: True}
- {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
policy:
name: stop_epoch
epoch: 72 # epoch in [71, ~) stop `ops`
ops: ['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']
collate_fn:
type: BatchImageCollateFunction
base_size: 640
base_size_repeat: 3
stop_epoch: 72 # epoch in [72, ~) stop `multiscales`
shuffle: True
total_batch_size: 8 # total batch size equals to 32 (4 * 8)
num_workers: 4
val_dataloader:
dataset:
transforms:
ops:
- {type: Resize, size: [640, 640], }
- {type: ConvertPILImage, dtype: 'float32', scale: True}
shuffle: False
total_batch_size: 8
num_workers: 4
- I then did not modify anything else and proceeded to the training using the command:
!CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py -c "/content/DEIM/configs/deim_rtdetrv2/deim_r18vd_120e_coco.yml" --use-amp --seed=0 -t "/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth"
I then got the following output:
2025-02-28 09:13:07.162205: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1740733987.183540 13770 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1740733987.190107 13770 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-28 09:13:07.211146: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Initialized distributed mode...
cfg: {'task': 'detection', '_model': None, '_postprocessor': None, '_criterion': None, '_optimizer': None, '_lr_scheduler': None, '_lr_warmup_scheduler': None, '_train_dataloader': None, '_val_dataloader': None, '_ema': None, '_scaler': None, '_train_dataset': None, '_val_dataset': None, '_collate_fn': None, '_evaluator': None, '_writer': None, 'num_workers': 0, 'batch_size': None, '_train_batch_size': None, '_val_batch_size': None, '_train_shuffle': None, '_val_shuffle': None, 'resume': None, 'tuning': '/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth', 'epoches': 120, 'last_epoch': -1, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'no_aug_epoch': 3, 'warmup_iter': 2000, 'flat_epoch': 64, 'use_amp': True, 'use_ema': True, 'ema_decay': 0.9999, 'ema_warmups': 2000, 'sync_bn': True, 'clip_max_norm': 0.1, 'find_unused_parameters': False, 'seed': 0, 'print_freq': 100, 'checkpoint_freq': 4, 'output_dir': './output/deim_rtdetrv2_r18vd_120e_coco', 'summary_dir': None, 'device': '', 'yaml_cfg': {'task': 'detection', 'evaluator': {'type': 'CocoEvaluator', 'iou_types': ['bbox']}, 'num_classes': 80, 'remap_mscoco_category': False, 'train_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'CocoDetection', 'img_folder': '/datassd/COCO/train2017/', 'ann_file': '/datassd/COCO/annotations/instances_train2017.json', 'return_masks': False, 'transforms': {'type': 'Compose', 'ops': [{'type': 'Mosaic', 'output_size': 320, 'rotation_range': 10, 'translation_range': [0.1, 0.1], 'scaling_range': [0.5, 1.5], 'probability': 1.0, 'fill_value': 0, 'use_cache': False, 'max_cached_images': 50, 'random_pop': True}, {'type': 'RandomPhotometricDistort', 'p': 0.5}, {'type': 'RandomZoomOut', 'fill': 0}, {'type': 'RandomIoUCrop', 'p': 0.8}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'RandomHorizontalFlip'}, {'type': 'Resize', 'size': [640, 640]}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}, {'type': 'ConvertBoxes', 'fmt': 'cxcywh', 'normalize': True}], 'policy': {'name': 'stop_epoch', 'epoch': [4, 64, 117], 'ops': ['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']}, 'mosaic_prob': 0.5}}, 'shuffle': True, 'num_workers': 4, 'drop_last': True, 'collate_fn': {'type': 'BatchImageCollateFunction', 'base_size': 640, 'base_size_repeat': 3, 'stop_epoch': 117, 'scales': None, 'mixup_prob': 0.5, 'mixup_epochs': [4, 64]}, 'total_batch_size': 16}, 'val_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'CocoDetection', 'img_folder': '/datassd/COCO/val2017/', 'ann_file': '/datassd/COCO/annotations/instances_val2017.json', 'return_masks': False, 'transforms': {'type': 'Compose', 'ops': [{'type': 'Resize', 'size': [640, 640]}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}]}}, 'shuffle': False, 'num_workers': 4, 'drop_last': False, 'collate_fn': {'type': 'BatchImageCollateFunction'}, 'total_batch_size': 8}, 'print_freq': 100, 'output_dir': './output/deim_rtdetrv2_r18vd_120e_coco', 'checkpoint_freq': 4, 'sync_bn': True, 'find_unused_parameters': False, 'use_amp': True, 'scaler': {'type': 'GradScaler', 'enabled': True}, 'use_ema': True, 'ema': {'type': 'ModelEMA', 'decay': 0.9999, 'warmups': 2000, 'start': 0}, 'epoches': 120, 'clip_max_norm': 0.1, 'optimizer': {'type': 'AdamW', 'params': [{'params': '^(?=.*(?:norm|bn)).*$', 'weight_decay': 0.0}], 'lr': 0.0002, 'betas': [0.9, 0.999], 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'MultiStepLR', 'milestones': [1000], 'gamma': 0.1}, 'lr_warmup_scheduler': {'type': 'LinearWarmup', 'warmup_duration': 2000}, 'model': 'DEIM', 'criterion': 'DEIMCriterion', 'postprocessor': 'PostProcessor', 'use_focal_loss': True, 'eval_spatial_size': [640, 640], 'DEIM': {'backbone': 'PResNet', 'encoder': 'HybridEncoder', 'decoder': 'RTDETRTransformerv2'}, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'warmup_iter': 2000, 'flat_epoch': 64, 'no_aug_epoch': 3, 'PResNet': {'depth': 18, 'variant': 'd', 'freeze_at': -1, 'return_idx': [1, 2, 3], 'num_stages': 4, 'freeze_norm': False, 'pretrained': True, 'local_model_dir': '../RT-DETR-main/rtdetrv2_pytorch/INK1k/'}, 'HybridEncoder': {'in_channels': [128, 256, 512], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'use_encoder_idx': [2], 'num_encoder_layers': 1, 'nhead': 8, 'dim_feedforward': 1024, 'dropout': 0.0, 'enc_act': 'gelu', 'expansion': 0.5, 'depth_mult': 1, 'act': 'silu', 'version': 'rt_detrv2'}, 'RTDETRTransformerv2': {'feat_channels': [256, 256, 256], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'num_levels': 3, 'num_layers': 3, 'num_queries': 300, 'num_denoising': 100, 'label_noise_ratio': 0.5, 'box_noise_scale': 1.0, 'eval_idx': -1, 'num_points': [4, 4, 4], 'cross_attn_method': 'default', 'query_select_method': 'default', 'query_pos_method': 'as_reg', 'activation': 'silu', 'mlp_act': 'silu'}, 'PostProcessor': {'num_top_queries': 300}, 'DEIMCriterion': {'weight_dict': {'loss_vfl': 1, 'loss_bbox': 5, 'loss_giou': 2, 'loss_mal': 1}, 'losses': ['mal', 'boxes'], 'alpha': 0.75, 'gamma': 1.5, 'use_uni_set': False, 'matcher': {'type': 'HungarianMatcher', 'weight_dict': {'cost_class': 2, 'cost_bbox': 5, 'cost_giou': 2}, 'alpha': 0.25, 'gamma': 2.0}}, '__include__': ['./rtdetrv2_r18vd_120e_coco.yml', '../base/rt_deim.yml'], 'config': '/content/DEIM/configs/deim_rtdetrv2/deim_r18vd_120e_coco.yml', 'tuning': '/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth', 'seed': 0, 'test_only': False, 'print_method': 'builtin', 'print_rank': 0}}
/content/DEIM/engine/backbone/presnet.py:227: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(model_path, map_location='cpu')
Loaded PResNet18 from local file@../RT-DETR-main/rtdetrv2_pytorch/INK1k/ResNet18_vd_pretrained_from_paddle.pth.
Load PResNet18 state_dict
### Query Position Embedding@as_reg ###
Tuning checkpoint from /content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth
/content/DEIM/engine/solver/_solver.py:169: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(path, map_location='cpu')
Load model.state_dict, {'missed': [], 'unmatched': []}
/content/DEIM/engine/core/workspace.py:180: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
return module(**module_kwargs)
Initial lr: [0.0002, 0.0002]
building train_dataloader with batch_size=16...
### Transform @Mosaic ###
### Transform @RandomPhotometricDistort ###
### Transform @RandomZoomOut ###
### Transform @RandomIoUCrop ###
### Transform @SanitizeBoundingBoxes ###
### Transform @RandomHorizontalFlip ###
### Transform @Resize ###
### Transform @SanitizeBoundingBoxes ###
### Transform @ConvertPILImage ###
### Transform @ConvertBoxes ###
### Mosaic with [email protected] and ZoomOut/IoUCrop existed ###
### ImgTransforms Epochs: [4, 64, 117] ###
### Policy_ops@['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop'] ###
[rank0]: Traceback (most recent call last):
[rank0]: File "/content/DEIM/train.py", line 84, in <module>
[rank0]: main(args)
[rank0]: File "/content/DEIM/train.py", line 54, in main
[rank0]: solver.fit()
[rank0]: File "/content/DEIM/engine/solver/det_solver.py", line 25, in fit
[rank0]: self.train()
[rank0]: File "/content/DEIM/engine/solver/_solver.py", line 87, in train
[rank0]: self.cfg.train_dataloader, shuffle=self.cfg.train_dataloader.shuffle
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/core/yaml_config.py", line 76, in train_dataloader
[rank0]: self._train_dataloader = self.build_dataloader('train_dataloader')
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/core/yaml_config.py", line 172, in build_dataloader
[rank0]: loader = create(name, global_cfg, batch_size=bs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/core/workspace.py", line 119, in create
[rank0]: return create(name, global_cfg)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/core/workspace.py", line 167, in create
[rank0]: module_kwargs[k] = create(name, global_cfg)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/core/workspace.py", line 180, in create
[rank0]: return module(**module_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/engine/data/dataset/coco_dataset.py", line 33, in __init__
[rank0]: super(CocoDetection, self).__init__(img_folder, ann_file)
[rank0]: File "/usr/local/lib/python3.11/dist-packages/torchvision/datasets/coco.py", line 37, in __init__
[rank0]: self.coco = COCO(annFile)
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/dist-packages/faster_coco_eval/core/coco.py", line 57, in __init__
[rank0]: self.dataset = self.load_json(annotation_file, self.use_deepcopy)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/dist-packages/faster_coco_eval/core/coco.py", line 302, in load_json
[rank0]: with open(json_file) as io:
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: '/datassd/COCO/annotations/instances_train2017.json'
E0228 09:13:17.895000 13755 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 13770) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 10, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-02-28_09:13:17
host : 2c5ae9ce8b33
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 13770)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Is my method in training correct? I followed steps but I seem to be missing something. Also I notice that why does the training need to search for '/datassd/COCO/annotations/instances_train2017.json' when I am intending for custom dataset?
I modified my command to:
!CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py -c "/content/DEIM/configs/deim_rtdetrv2/rtdetrv2_r18vd_120e_coco.yml" --use-amp --seed=0 -t "/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth"
About the same output:
2025-02-28 11:01:51.726318: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1740740511.747945 8193 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1740740511.754415 8193 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-28 11:01:51.776005: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Initialized distributed mode...
cfg: {'task': 'detection', '_model': None, '_postprocessor': None, '_criterion': None, '_optimizer': None, '_lr_scheduler': None, '_lr_warmup_scheduler': None, '_train_dataloader': None, '_val_dataloader': None, '_ema': None, '_scaler': None, '_train_dataset': None, '_val_dataset': None, '_collate_fn': None, '_evaluator': None, '_writer': None, 'num_workers': 0, 'batch_size': None, '_train_batch_size': None, '_val_batch_size': None, '_train_shuffle': None, '_val_shuffle': None, 'resume': None, 'tuning': '/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth', 'epoches': 120, 'last_epoch': -1, 'lrsheduler': 'flatcosine', 'lr_gamma': 1, 'no_aug_epoch': 0, 'warmup_iter': 2000, 'flat_epoch': 4000000, 'use_amp': True, 'use_ema': True, 'ema_decay': 0.9999, 'ema_warmups': 2000, 'sync_bn': True, 'clip_max_norm': 0.1, 'find_unused_parameters': False, 'seed': 0, 'print_freq': 100, 'checkpoint_freq': 4, 'output_dir': './output/rtdetrv2_r18vd_120e_coco', 'summary_dir': None, 'device': '', 'yaml_cfg': {'task': 'detection', 'evaluator': {'type': 'CocoEvaluator', 'iou_types': ['bbox']}, 'num_classes': 80, 'remap_mscoco_category': True, 'train_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'CocoDetection', 'img_folder': '/datassd/COCO/train2017/', 'ann_file': '/datassd/COCO/annotations/instances_train2017.json', 'return_masks': False, 'transforms': {'type': 'Compose', 'ops': [{'type': 'RandomPhotometricDistort', 'p': 0.5}, {'type': 'RandomZoomOut', 'fill': 0}, {'type': 'RandomIoUCrop', 'p': 0.8}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'RandomHorizontalFlip'}, {'type': 'Resize', 'size': [640, 640]}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}, {'type': 'ConvertBoxes', 'fmt': 'cxcywh', 'normalize': True}], 'policy': {'name': 'stop_epoch', 'epoch': 117, 'ops': ['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']}}}, 'shuffle': True, 'num_workers': 4, 'drop_last': True, 'collate_fn': {'type': 'BatchImageCollateFunction', 'base_size': 640, 'base_size_repeat': 3, 'stop_epoch': 72, 'scales': None}, 'total_batch_size': 16}, 'val_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'CocoDetection', 'img_folder': '/datassd/COCO/val2017/', 'ann_file': '/datassd/COCO/annotations/instances_val2017.json', 'return_masks': False, 'transforms': {'type': 'Compose', 'ops': [{'type': 'Resize', 'size': [640, 640]}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}]}}, 'shuffle': False, 'num_workers': 4, 'drop_last': False, 'collate_fn': {'type': 'BatchImageCollateFunction'}, 'total_batch_size': 8}, 'print_freq': 100, 'output_dir': './output/rtdetrv2_r18vd_120e_coco', 'checkpoint_freq': 4, 'sync_bn': True, 'find_unused_parameters': False, 'use_amp': True, 'scaler': {'type': 'GradScaler', 'enabled': True}, 'use_ema': True, 'ema': {'type': 'ModelEMA', 'decay': 0.9999, 'warmups': 2000, 'start': 0}, 'epoches': 120, 'clip_max_norm': 0.1, 'optimizer': {'type': 'AdamW', 'params': [{'params': '^(?=.*(?:norm|bn)).*$', 'weight_decay': 0.0}], 'lr': 0.0001, 'betas': [0.9, 0.999], 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'MultiStepLR', 'milestones': [1000], 'gamma': 0.1}, 'lr_warmup_scheduler': {'type': 'LinearWarmup', 'warmup_duration': 2000}, 'model': 'DEIM', 'criterion': 'DEIMCriterion', 'postprocessor': 'PostProcessor', 'use_focal_loss': True, 'eval_spatial_size': [640, 640], 'DEIM': {'backbone': 'PResNet', 'encoder': 'HybridEncoder', 'decoder': 'RTDETRTransformerv2'}, 'lrsheduler': 'flatcosine', 'lr_gamma': 1, 'warmup_iter': 2000, 'flat_epoch': 4000000, 'no_aug_epoch': 0, 'PResNet': {'depth': 18, 'variant': 'd', 'freeze_at': -1, 'return_idx': [1, 2, 3], 'num_stages': 4, 'freeze_norm': False, 'pretrained': True, 'local_model_dir': '../RT-DETR-main/rtdetrv2_pytorch/INK1k/'}, 'HybridEncoder': {'in_channels': [128, 256, 512], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'use_encoder_idx': [2], 'num_encoder_layers': 1, 'nhead': 8, 'dim_feedforward': 1024, 'dropout': 0.0, 'enc_act': 'gelu', 'expansion': 0.5, 'depth_mult': 1, 'act': 'silu', 'version': 'rt_detrv2'}, 'RTDETRTransformerv2': {'feat_channels': [256, 256, 256], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'num_levels': 3, 'num_layers': 3, 'num_queries': 300, 'num_denoising': 100, 'label_noise_ratio': 0.5, 'box_noise_scale': 1.0, 'eval_idx': -1, 'num_points': [4, 4, 4], 'cross_attn_method': 'default', 'query_select_method': 'default'}, 'PostProcessor': {'num_top_queries': 300}, 'DEIMCriterion': {'weight_dict': {'loss_vfl': 1, 'loss_bbox': 5, 'loss_giou': 2}, 'losses': ['vfl', 'boxes'], 'alpha': 0.75, 'gamma': 2.0, 'use_uni_set': False, 'matcher': {'type': 'HungarianMatcher', 'weight_dict': {'cost_class': 2, 'cost_bbox': 5, 'cost_giou': 2}, 'alpha': 0.25, 'gamma': 2.0}}, '__include__': ['../dataset/coco_detection.yml', '../runtime.yml', '../base/dataloader.yml', '../base/rt_optimizer.yml', '../base/rtdetrv2_r50vd.yml'], 'config': '/content/DEIM/configs/deim_rtdetrv2/rtdetrv2_r18vd_120e_coco.yml', 'tuning': '/content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth', 'seed': 0, 'test_only': False, 'print_method': 'builtin', 'print_rank': 0}}
/content/DEIM/DEIM/../engine/backbone/presnet.py:227: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(model_path, map_location='cpu')
Loaded PResNet18 from local file@../RT-DETR-main/rtdetrv2_pytorch/INK1k/ResNet18_vd_pretrained_from_paddle.pth.
Load PResNet18 state_dict
Tuning checkpoint from /content/DEIM/deim_rtdetrv2_r18vd_coco_120e.pth
/content/DEIM/DEIM/../engine/solver/_solver.py:169: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(path, map_location='cpu')
Load model.state_dict, {'missed': [], 'unmatched': ['decoder.query_pos_head.layers.0.weight', 'decoder.query_pos_head.layers.0.bias', 'decoder.query_pos_head.layers.1.weight']}
/content/DEIM/DEIM/../engine/core/workspace.py:180: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
return module(**module_kwargs)
Initial lr: [0.0001, 0.0001]
building train_dataloader with batch_size=16...
### Transform @RandomPhotometricDistort ###
### Transform @RandomZoomOut ###
### Transform @RandomIoUCrop ###
### Transform @SanitizeBoundingBoxes ###
### Transform @RandomHorizontalFlip ###
### Transform @Resize ###
### Transform @SanitizeBoundingBoxes ###
### Transform @ConvertPILImage ###
### Transform @ConvertBoxes ###
### ImgTransforms Epochs: 117 ###
### Policy_ops@['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop'] ###
[rank0]: Traceback (most recent call last):
[rank0]: File "/content/DEIM/DEIM/train.py", line 84, in <module>
[rank0]: main(args)
[rank0]: File "/content/DEIM/DEIM/train.py", line 54, in main
[rank0]: solver.fit()
[rank0]: File "/content/DEIM/DEIM/../engine/solver/det_solver.py", line 25, in fit
[rank0]: self.train()
[rank0]: File "/content/DEIM/DEIM/../engine/solver/_solver.py", line 87, in train
[rank0]: self.cfg.train_dataloader, shuffle=self.cfg.train_dataloader.shuffle
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/core/yaml_config.py", line 76, in train_dataloader
[rank0]: self._train_dataloader = self.build_dataloader('train_dataloader')
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/core/yaml_config.py", line 172, in build_dataloader
[rank0]: loader = create(name, global_cfg, batch_size=bs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/core/workspace.py", line 119, in create
[rank0]: return create(name, global_cfg)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/core/workspace.py", line 167, in create
[rank0]: module_kwargs[k] = create(name, global_cfg)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/core/workspace.py", line 180, in create
[rank0]: return module(**module_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/content/DEIM/DEIM/../engine/data/dataset/coco_dataset.py", line 33, in __init__
[rank0]: super(CocoDetection, self).__init__(img_folder, ann_file)
[rank0]: File "/usr/local/lib/python3.11/dist-packages/torchvision/datasets/coco.py", line 37, in __init__
[rank0]: self.coco = COCO(annFile)
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/dist-packages/faster_coco_eval/core/coco.py", line 57, in __init__
[rank0]: self.dataset = self.load_json(annotation_file, self.use_deepcopy)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/dist-packages/faster_coco_eval/core/coco.py", line 302, in load_json
[rank0]: with open(json_file) as io:
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: '/datassd/COCO/annotations/instances_train2017.json'
E0228 11:02:02.741000 8178 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 8193) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 10, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-02-28_11:02:02
host : 059ca17322f9
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 8193)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Issue changed. I went to my own device to test training, I had to download COCO dataset for this which was really a hassle since the dataset is 16GB+, and I think that was only for the "train" images. So after downloading, I seem to be able to move to the next section but now, I am also getting "NotImplementedError", it seems to be the same issue from this: https://github.com/ShihuaHuang95/DEIM/issues/33
Training command:
python train.py -c "deim_dfine\deim_hgnetv2_s_coco.yml" --use-amp --seed=0 -d cpu -t "deim_dfine_hgnetv2_s_coco_120e.pth"
The output is below:
Not init distributed mode.
cfg: {'task': 'detection', '_model': None, '_postprocessor': None, '_criterion': None, '_optimizer': None, '_lr_scheduler': None, '_lr_warmup_scheduler': None, '_train_dataloader': None, '_val_dataloader': None, '_ema': None, '_scaler': None, '_train_dataset': None, '_val_dataset': None, '_collate_fn': None, '_evaluator': None, '_writer': None, 'num_workers': 0, 'batch_size': None, '_train_batch_size': None, '_val_batch_size': None, '_train_shuffle': None, '_val_shuffle': None, 'resume': None, 'tuning': 'C:\\DEIM-D-FINE_models_config\\S_DEIM-DEFINE\\deim_dfine_hgnetv2_s_coco_120e.pth', 'epoches': 132, 'last_
epoch': -1, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'no_aug_epoch': 12, 'warmup_iter': 2000, 'flat_epoch': 64, 'use_amp': True, 'use_ema': True, 'ema
_decay': 0.9999, 'ema_warmups': 2000, 'sync_bn': True, 'clip_max_norm': 0.1, 'find_unused_parameters': False, 'seed': 0, 'print_freq': 100, 'checkpoint_fr
eq': 4, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'summary_dir': None, 'device': 'cpu', 'yaml_cfg': {'task': 'detection', 'evaluator': {'type': 'Coco
Evaluator', 'iou_types': ['bbox']}, 'num_classes': 80, 'remap_mscoco_category': False, 'train_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'Coc
oDetection', 'img_folder': 'C:/COCO/train2017/', 'ann_file': 'C:/COCO/annotations/instances_train2017.json', 'return_masks': False, 'transforms': {'type':
'Compose', 'ops': [{'type': 'Mosaic', 'output_size': 320, 'rotation_range': 10, 'translation_range': [0.1, 0.1], 'scaling_range': [0.5, 1.5], 'probabilit
y': 1.0, 'fill_value': 0, 'use_cache': False, 'max_cached_images': 50, 'random_pop': True}, {'type': 'RandomPhotometricDistort', 'p': 0.5}, {'type': 'Rand
omZoomOut', 'fill': 0}, {'type': 'RandomIoUCrop', 'p': 0.8}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'RandomHorizontalFlip'}, {'type':
'Resize', 'size': [640, 640]}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}, {'type':
'ConvertBoxes', 'fmt': 'cxcywh', 'normalize': True}], 'policy': {'name': 'stop_epoch', 'epoch': [4, 64, 120], 'ops': ['Mosaic', 'RandomPhotometricDistort'
, 'RandomZoomOut', 'RandomIoUCrop']}, 'mosaic_prob': 0.5}}, 'shuffle': True, 'num_workers': 4, 'drop_last': True, 'collate_fn': {'type': 'BatchImageCollat
eFunction', 'base_size': 640, 'base_size_repeat': 20, 'stop_epoch': 120, 'ema_restart_decay': 0.9999, 'mixup_prob': 0.5, 'mixup_epochs': [4, 64]}, 'total_
batch_size': 1}, 'val_dataloader': {'type': 'DataLoader', 'dataset': {'type': 'CocoDetection', 'img_folder': 'C:/COCO/val2017/', 'ann_file': 'C:/COCO/anno
tations/instances_val2017.json', 'return_masks': False, 'transforms': {'type': 'Compose', 'ops': [{'type': 'Resize', 'size': [640, 640]}, {'type': 'Conver
tPILImage', 'dtype': 'float32', 'scale': True}]}}, 'shuffle': False, 'num_workers': 4, 'drop_last': False, 'collate_fn': {'type': 'BatchImageCollateFuncti
on'}, 'total_batch_size': 1}, 'print_freq': 100, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'checkpoint_freq': 4, 'sync_bn': True, 'find_unused_parame
ters': False, 'use_amp': True, 'scaler': {'type': 'GradScaler', 'enabled': True}, 'use_ema': True, 'ema': {'type': 'ModelEMA', 'decay': 0.9999, 'warmups':
1000, 'start': 0}, 'epoches': 132, 'clip_max_norm': 0.1, 'optimizer': {'type': 'AdamW', 'params': [{'params': '^(?=.*backbone)(?!.*bn).*$', 'lr': 0.0002}
, {'params': '^(?=.*(?:norm|bn)).*$', 'weight_decay': 0.0}], 'lr': 0.0004, 'betas': [0.9, 0.999], 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'Multi
StepLR', 'milestones': [500], 'gamma': 0.1}, 'lr_warmup_scheduler': {'type': 'LinearWarmup', 'warmup_duration': 500}, 'model': 'DEIM', 'criterion': 'DEIMC
riterion', 'postprocessor': 'PostProcessor', 'use_focal_loss': True, 'eval_spatial_size': [640, 640], 'DEIM': {'backbone': 'HGNetv2', 'encoder': 'HybridEn
coder', 'decoder': 'DFINETransformer'}, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'warmup_iter': 2000, 'flat_epoch': 64, 'no_aug_epoch': 12, 'HGNetv2':
{'pretrained': False, 'local_model_dir': '../RT-DETR-main/D-FINE/weight/hgnetv2/', 'name': 'B0', 'return_idx': [1, 2, 3], 'freeze_at': -1, 'freeze_norm':
False, 'use_lab': True}, 'HybridEncoder': {'in_channels': [256, 512, 1024], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'use_encoder_idx': [2], 'num_
encoder_layers': 1, 'nhead': 8, 'dim_feedforward': 1024, 'dropout': 0.0, 'enc_act': 'gelu', 'expansion': 0.5, 'depth_mult': 0.34, 'act': 'silu'}, 'DFINETr
ansformer': {'feat_channels': [256, 256, 256], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'num_levels': 3, 'num_layers': 3, 'eval_idx': -1, 'num_quer
ies': 300, 'num_denoising': 100, 'label_noise_ratio': 0.5, 'box_noise_scale': 1.0, 'reg_max': 32, 'reg_scale': 4, 'layer_scale': 1, 'num_points': [3, 6, 3
], 'cross_attn_method': 'default', 'query_select_method': 'default', 'activation': 'silu', 'mlp_act': 'silu'}, 'PostProcessor': {'num_top_queries': 300},
'DEIMCriterion': {'weight_dict': {'loss_vfl': 1, 'loss_bbox': 5, 'loss_giou': 2, 'loss_fgl': 0.15, 'loss_ddf': 1.5, 'loss_mal': 1}, 'losses': ['mal', 'box
es', 'local'], 'alpha': 0.75, 'gamma': 1.5, 'reg_max': 32, 'matcher': {'type': 'HungarianMatcher', 'weight_dict': {'cost_class': 2, 'cost_bbox': 5, 'cost_
giou': 2}, 'alpha': 0.25, 'gamma': 2.0}}, '__include__': ['./dfine_hgnetv2_s_coco.yml', '../base/deim.yml'], 'config': 'C:\\Users\\griff\\PycharmProjects\
\DEIM\\DEIM\\configs\\deim_dfine\\deim_hgnetv2_s_coco.yml', 'tuning': 'C:\\DEIM-D-FINE_models_config\\S_DEIM-DEFINE\\deim_dfine_hgnetv2_s_coco_120e.pth', 'device': 'cpu', 'seed': 0, 'test_only': False, 'print_method': 'builtin', 'print_rank': 0}}
Tuning checkpoint from C:\DEIM-D-FINE_models_config\S_DEIM-DEFINE\deim_dfine_hgnetv2_s_coco_120e.pth
Load model.state_dict, {'missed': [], 'unmatched': []}
DEIM\engine\core\workspace.py:180: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
return module(**module_kwargs)
.venv\Lib\site-packages\torch\amp\grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
Initial lr: [0.0002, 0.0004, 0.0004]
building train_dataloader with batch_size=1...
### Transform @Mosaic ###
### Transform @RandomPhotometricDistort ###
### Transform @RandomZoomOut ###
### Transform @RandomIoUCrop ###
### Transform @SanitizeBoundingBoxes ###
### Transform @RandomHorizontalFlip ###
### Transform @Resize ###
### Transform @SanitizeBoundingBoxes ###
### Transform @ConvertPILImage ###
### Transform @ConvertBoxes ###
### Mosaic with [email protected] and ZoomOut/IoUCrop existed ###
### ImgTransforms Epochs: [4, 64, 120] ###
### Policy_ops@['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop'] ###
### Using MixUp with [email protected] in [4, 64] epochs ###
### Multi-scale Training until 120 epochs ###
### Multi-scales@ [480, 512, 544, 576, 608, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 640, 800, 768, 736, 704, 672] ###
building val_dataloader with batch_size=1...
### Transform @Resize ###
### Transform @ConvertPILImage ###
------------------------------------- Calculate Flops Results -------------------------------------
Notations:
number of parameters (Params), number of multiply-accumulate operations(MACs),
number of floating-point operations (FLOPs), floating-point operations per second (FLOPS),
fwd FLOPs (model forward propagation FLOPs), bwd FLOPs (model backward propagation FLOPs),
default model backpropagation takes 2.00 times as much computation as forward propagation.
Total Training Params: 10.24 M
fwd MACs: 12.5323 GMACs
fwd FLOPs: 25.1714 GFLOPS
fwd+bwd MACs: 37.5969 GMACs
fwd+bwd FLOPs: 75.5141 GFLOPS
---------------------------------------------------------------------------------------------------
{'Model FLOPs:25.1714 GFLOPS MACs:12.5323 GMACs Params:10237491'}
------------------------------------------Start training-------------------------------------------
## Using Self-defined Scheduler-flatcosine ##
[0.0002, 0.0004, 0.0004] [0.0001, 0.0002, 0.0002] 15613884 2000 7570368 1419444
number of trainable parameters: 10321875
Traceback (most recent call last):
File "DEIM\train.py", line 84, in <module>
main(args)
File "DEIM\train.py", line 54, in main
solver.fit()
File "DEIM\engine\solver\det_solver.py", line 76, in fit
train_stats = train_one_epoch(
^^^^^^^^^^^^^^^^
File "DEIM\engine\solver\det_engine.py", line 42, in train_one_epoch
for i, (samples, targets) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
File "DEIM\engine\misc\logger.py", line 215, in log_every
for obj in iterable:
File ".venv\Lib\site-packages\torch\utils\data\dataloader.py", line 708, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1480, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1505, in _process_data
data.reraise()
File ".venv\Lib\site-packages\torch\_utils.py", line 733, in reraise
raise exception
NotImplementedError: Caught NotImplementedError in DataLoader worker process 0.
Original Traceback (most recent call last):
File ".venv\Lib\site-packages\torch\utils\data\_utils\worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "DEIM\engine\data\dataset\coco_dataset.py", line 44, in __getitem__
img, target, _ = self._transforms(img, target, self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "DEIM\engine\data\transforms\container.py", line 58, in forward
return self.get_forward(self.policy['name'])(*inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "DEIM\engine\data\transforms\container.py", line 100, in stop_epoch_forward
sample = transform(sample)
^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torchvision\transforms\v2\_transform.py", line 68, in forward
flat_outputs = [
^
File ".venv\Lib\site-packages\torchvision\transforms\v2\_transform.py", line 69, in <listcomp>
self.transform(inpt, params) if needs_transform else inpt
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\torchvision\transforms\v2\_transform.py", line 55, in transform
raise NotImplementedError
NotImplementedError
i have installed a lower torchvision version. which should fix the last problem.
and the problem that the coco dataset is required can be fixed by removing '../dataset/coco_detection.yml', form the dfine_hgnetv2_s_coco.yml config:
__include__: [ '../dataset/coco_detection.yml', '../runtime.yml', '../base/dataloader.yml', '../base/optimizer.yml', '../base/dfine_hgnetv2.yml', ]
Thank you. I did your recommendations, I even downgraded my torchvision down to 0.15.0 but I encountered an error in which it says >= 0.15.2. So I installed 0.15.2. After that, I restarted my runtime to make sure everything is reloaded.
I first ran an inference using the s model just to make sure everything is working and installed properly. I got a torch_results.jpg with detection overlaid. So everything seems fine. So proceed to configuration:
- In my
custom_detection.yml, I only changed the paths to the image and to my .json file datasets. They are in MS COCO format:
dataset/
├── images/
│ ├── train/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ ├── val/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
└── annotations/
├── instances_train.json
├── instances_val.json
└── ...
-
I want to train the
sversion of DEIM, so I went todefine_hgnetv2_coco.ymland removed'../dataset/coco_detection.yml',as suggested in https://github.com/ShihuaHuang95/DEIM/issues/39#issuecomment-2694538280. -
In
dataloader.yml, I settotal_batch_size: 2for bothtrainandval -
Those are all my changes, I left others untouched.
I then ran the training command:
!CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 "/content/DEIM/train.py" -c "/content/DEIM/configs/deim_dfine/deim_hgnetv2_s_coco.yml" --use-amp --seed=0 -t "/content/deim_dfine_hgnetv2_s_coco_120e.pth"
The result:
2025-03-04 03:16:43.241175: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1741058203.264250 8736 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741058203.270997 8736 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-04 03:16:43.294117: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Initialized distributed mode...
cfg: {'task': 'detection', '_model': None, '_postprocessor': None, '_criterion': None, '_optimizer': None, '_lr_scheduler': None, '_lr_warmup_scheduler': None, '_train_dataloader': None, '_val_dataloader': None, '_ema': None, '_scaler': None, '_train_dataset': None, '_val_dataset': None, '_collate_fn': None, '_evaluator': None, '_writer': None, 'num_workers': 0, 'batch_size': None, '_train_batch_size': None, '_val_batch_size': None, '_train_shuffle': None, '_val_shuffle': None, 'resume': None, 'tuning': '/content/deim_dfine_hgnetv2_s_coco_120e.pth', 'epoches': 132, 'last_epoch': -1, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'no_aug_epoch': 12, 'warmup_iter': 2000, 'flat_epoch': 64, 'use_amp': True, 'use_ema': True, 'ema_decay': 0.9999, 'ema_warmups': 2000, 'sync_bn': True, 'clip_max_norm': 0.1, 'find_unused_parameters': False, 'seed': 0, 'print_freq': 100, 'checkpoint_freq': 4, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'summary_dir': None, 'device': '', 'yaml_cfg': {'print_freq': 100, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'checkpoint_freq': 4, 'sync_bn': True, 'find_unused_parameters': False, 'use_amp': True, 'scaler': {'type': 'GradScaler', 'enabled': True}, 'use_ema': True, 'ema': {'type': 'ModelEMA', 'decay': 0.9999, 'warmups': 1000, 'start': 0}, 'train_dataloader': {'dataset': {'transforms': {'ops': [{'type': 'Mosaic', 'output_size': 320, 'rotation_range': 10, 'translation_range': [0.1, 0.1], 'scaling_range': [0.5, 1.5], 'probability': 1.0, 'fill_value': 0, 'use_cache': False, 'max_cached_images': 50, 'random_pop': True}, {'type': 'RandomPhotometricDistort', 'p': 0.5}, {'type': 'RandomZoomOut', 'fill': 0}, {'type': 'RandomIoUCrop', 'p': 0.8}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'RandomHorizontalFlip'}, {'type': 'Resize', 'size': [640, 640]}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}, {'type': 'ConvertBoxes', 'fmt': 'cxcywh', 'normalize': True}], 'policy': {'name': 'stop_epoch', 'epoch': [4, 64, 120], 'ops': ['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']}, 'mosaic_prob': 0.5}}, 'collate_fn': {'type': 'BatchImageCollateFunction', 'base_size': 640, 'base_size_repeat': 20, 'stop_epoch': 120, 'ema_restart_decay': 0.9999, 'mixup_prob': 0.5, 'mixup_epochs': [4, 64]}, 'shuffle': True, 'total_batch_size': 2, 'num_workers': 4}, 'val_dataloader': {'dataset': {'transforms': {'ops': [{'type': 'Resize', 'size': [640, 640]}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}]}}, 'shuffle': False, 'total_batch_size': 2, 'num_workers': 4}, 'epoches': 132, 'clip_max_norm': 0.1, 'optimizer': {'type': 'AdamW', 'params': [{'params': '^(?=.*backbone)(?!.*bn).*$', 'lr': 0.0002}, {'params': '^(?=.*(?:norm|bn)).*$', 'weight_decay': 0.0}], 'lr': 0.0004, 'betas': [0.9, 0.999], 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'MultiStepLR', 'milestones': [500], 'gamma': 0.1}, 'lr_warmup_scheduler': {'type': 'LinearWarmup', 'warmup_duration': 500}, 'task': 'detection', 'model': 'DEIM', 'criterion': 'DEIMCriterion', 'postprocessor': 'PostProcessor', 'use_focal_loss': True, 'eval_spatial_size': [640, 640], 'DEIM': {'backbone': 'HGNetv2', 'encoder': 'HybridEncoder', 'decoder': 'DFINETransformer'}, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'warmup_iter': 2000, 'flat_epoch': 64, 'no_aug_epoch': 12, 'HGNetv2': {'pretrained': False, 'local_model_dir': '../RT-DETR-main/D-FINE/weight/hgnetv2/', 'name': 'B0', 'return_idx': [1, 2, 3], 'freeze_at': -1, 'freeze_norm': False, 'use_lab': True}, 'HybridEncoder': {'in_channels': [256, 512, 1024], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'use_encoder_idx': [2], 'num_encoder_layers': 1, 'nhead': 8, 'dim_feedforward': 1024, 'dropout': 0.0, 'enc_act': 'gelu', 'expansion': 0.5, 'depth_mult': 0.34, 'act': 'silu'}, 'DFINETransformer': {'feat_channels': [256, 256, 256], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'num_levels': 3, 'num_layers': 3, 'eval_idx': -1, 'num_queries': 300, 'num_denoising': 100, 'label_noise_ratio': 0.5, 'box_noise_scale': 1.0, 'reg_max': 32, 'reg_scale': 4, 'layer_scale': 1, 'num_points': [3, 6, 3], 'cross_attn_method': 'default', 'query_select_method': 'default', 'activation': 'silu', 'mlp_act': 'silu'}, 'PostProcessor': {'num_top_queries': 300}, 'DEIMCriterion': {'weight_dict': {'loss_vfl': 1, 'loss_bbox': 5, 'loss_giou': 2, 'loss_fgl': 0.15, 'loss_ddf': 1.5, 'loss_mal': 1}, 'losses': ['mal', 'boxes', 'local'], 'alpha': 0.75, 'gamma': 1.5, 'reg_max': 32, 'matcher': {'type': 'HungarianMatcher', 'weight_dict': {'cost_class': 2, 'cost_bbox': 5, 'cost_giou': 2}, 'alpha': 0.25, 'gamma': 2.0}}, '__include__': ['./dfine_hgnetv2_s_coco.yml', '../base/deim.yml'], 'config': '/content/DEIM/configs/deim_dfine/deim_hgnetv2_s_coco.yml', 'tuning': '/content/deim_dfine_hgnetv2_s_coco_120e.pth', 'seed': 0, 'test_only': False, 'print_method': 'builtin', 'print_rank': 0}}
Tuning checkpoint from /content/deim_dfine_hgnetv2_s_coco_120e.pth
Load model.state_dict, {'missed': [], 'unmatched': []}
Initial lr: [0.0002, 0.0004, 0.0004]
building train_dataloader with batch_size=2...
Traceback (most recent call last):
File "/content/DEIM/train.py", line 84, in <module>
main(args)
File "/content/DEIM/train.py", line 54, in main
solver.fit()
File "/content/DEIM/engine/solver/det_solver.py", line 25, in fit
self.train()
File "/content/DEIM/engine/solver/_solver.py", line 87, in train
self.cfg.train_dataloader, shuffle=self.cfg.train_dataloader.shuffle
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/yaml_config.py", line 76, in train_dataloader
self._train_dataloader = self.build_dataloader('train_dataloader')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/yaml_config.py", line 172, in build_dataloader
loader = create(name, global_cfg, batch_size=bs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/workspace.py", line 121, in create
module = getattr(cfg['_pymodule'], name)
~~~^^^^^^^^^^^^^
KeyError: '_pymodule'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 8736) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/content/DEIM/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-03-04_03:16:56
host : 81fa70c56262
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 8736)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I searched for the keyword: KeyError: '_pymodule' in this repo and got the issue -> https://github.com/ShihuaHuang95/DEIM/issues/10
The answer provided is to check for paths and configuration which I believe I have already done so. Any recommendations, ideas?
maybe try to run it with python instead of torchrun first:
python train.py -c configs/deim_dfine/deim_hgnetv2_s_coco.yml --use-amp --seed=0 -t deim_dfine_hgnetv2_s_coco_120e.pth
I ran using pytorch -> !python train.py -c configs/deim_dfine/deim_hgnetv2_s_coco.yml --use-amp --seed=0 -t /content/deim_dfine_hgnetv2_s_coco_120e.pth
I got same result of: KeyError: '_pymodule'
I tried:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118- and finally ->
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
Inference works on all versions/installations above.
My JSON files also seem compliant (section of the file in instances_train.json):
{"id":939,"image_id":98,"category_id":1,"bbox":[55,144,75,68],"area":5100,"segmentation":[],"iscrowd":0},{"id":940,"image_id":98,"category_id":1,"bbox":[157,179,57,57],"area":3249,"segmentation":[],"iscrowd":0},{"id":941,"image_id":98,"category_id":1,"bbox":[92,176,66,61],"area":4026,"segmentation":[],"iscrowd":0},{"id":942,"image_id":98,"category_id":1,"bbox":[128,139,44,35],"area":1540,"segmentation":[],"iscrowd":0},{"id":943,"image_id":98,"category_id":1,"bbox":[175,133,38,43],"area":1634,"segmentation":[],"iscrowd":0},{"id":944,"image_id":98,"category_id":2,"bbox":[13,248,38,52],"area":1976,"segmentation":[],"iscrowd":0},{"id":945,"image_id":98,"category_id":1,"bbox":[218,119,30,33],"area":990,"segmentation":[],"iscrowd":0},
I kept getting the same error:
2025-03-05 08:56:13.010289: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1741164973.031805 7435 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741164973.038648 7435 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-05 08:56:13.060338: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Not init distributed mode.
cfg: {'task': 'detection', '_model': None, '_postprocessor': None, '_criterion': None, '_optimizer': None, '_lr_scheduler': None, '_lr_warmup_scheduler': None, '_train_dataloader': None, '_val_dataloader': None, '_ema': None, '_scaler': None, '_train_dataset': None, '_val_dataset': None, '_collate_fn': None, '_evaluator': None, '_writer': None, 'num_workers': 0, 'batch_size': None, '_train_batch_size': None, '_val_batch_size': None, '_train_shuffle': None, '_val_shuffle': None, 'resume': None, 'tuning': '/content/deim_dfine_hgnetv2_s_coco_120e.pth', 'epoches': 132, 'last_epoch': -1, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'no_aug_epoch': 12, 'warmup_iter': 2000, 'flat_epoch': 64, 'use_amp': True, 'use_ema': True, 'ema_decay': 0.9999, 'ema_warmups': 2000, 'sync_bn': True, 'clip_max_norm': 0.1, 'find_unused_parameters': False, 'seed': 0, 'print_freq': 100, 'checkpoint_freq': 4, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'summary_dir': None, 'device': '', 'yaml_cfg': {'print_freq': 100, 'output_dir': './outputs/deim_hgnetv2_s_coco', 'checkpoint_freq': 4, 'sync_bn': True, 'find_unused_parameters': False, 'use_amp': True, 'scaler': {'type': 'GradScaler', 'enabled': True}, 'use_ema': True, 'ema': {'type': 'ModelEMA', 'decay': 0.9999, 'warmups': 1000, 'start': 0}, 'train_dataloader': {'dataset': {'transforms': {'ops': [{'type': 'Mosaic', 'output_size': 320, 'rotation_range': 10, 'translation_range': [0.1, 0.1], 'scaling_range': [0.5, 1.5], 'probability': 1.0, 'fill_value': 0, 'use_cache': False, 'max_cached_images': 50, 'random_pop': True}, {'type': 'RandomPhotometricDistort', 'p': 0.5}, {'type': 'RandomZoomOut', 'fill': 0}, {'type': 'RandomIoUCrop', 'p': 0.8}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'RandomHorizontalFlip'}, {'type': 'Resize', 'size': [640, 640]}, {'type': 'SanitizeBoundingBoxes', 'min_size': 1}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}, {'type': 'ConvertBoxes', 'fmt': 'cxcywh', 'normalize': True}], 'policy': {'name': 'stop_epoch', 'epoch': [4, 64, 120], 'ops': ['Mosaic', 'RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']}, 'mosaic_prob': 0.5}}, 'collate_fn': {'type': 'BatchImageCollateFunction', 'base_size': 640, 'base_size_repeat': 20, 'stop_epoch': 120, 'ema_restart_decay': 0.9999, 'mixup_prob': 0.5, 'mixup_epochs': [4, 64]}, 'shuffle': True, 'total_batch_size': 1, 'num_workers': 4}, 'val_dataloader': {'dataset': {'transforms': {'ops': [{'type': 'Resize', 'size': [640, 640]}, {'type': 'ConvertPILImage', 'dtype': 'float32', 'scale': True}]}}, 'shuffle': False, 'total_batch_size': 1, 'num_workers': 4}, 'epoches': 132, 'clip_max_norm': 0.1, 'optimizer': {'type': 'AdamW', 'params': [{'params': '^(?=.*backbone)(?!.*bn).*$', 'lr': 0.0002}, {'params': '^(?=.*(?:norm|bn)).*$', 'weight_decay': 0.0}], 'lr': 0.0004, 'betas': [0.9, 0.999], 'weight_decay': 0.0001}, 'lr_scheduler': {'type': 'MultiStepLR', 'milestones': [500], 'gamma': 0.1}, 'lr_warmup_scheduler': {'type': 'LinearWarmup', 'warmup_duration': 500}, 'task': 'detection', 'model': 'DEIM', 'criterion': 'DEIMCriterion', 'postprocessor': 'PostProcessor', 'use_focal_loss': True, 'eval_spatial_size': [640, 640], 'DEIM': {'backbone': 'HGNetv2', 'encoder': 'HybridEncoder', 'decoder': 'DFINETransformer'}, 'lrsheduler': 'flatcosine', 'lr_gamma': 0.5, 'warmup_iter': 2000, 'flat_epoch': 64, 'no_aug_epoch': 12, 'HGNetv2': {'pretrained': False, 'local_model_dir': '../RT-DETR-main/D-FINE/weight/hgnetv2/', 'name': 'B0', 'return_idx': [1, 2, 3], 'freeze_at': -1, 'freeze_norm': False, 'use_lab': True}, 'HybridEncoder': {'in_channels': [256, 512, 1024], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'use_encoder_idx': [2], 'num_encoder_layers': 1, 'nhead': 8, 'dim_feedforward': 1024, 'dropout': 0.0, 'enc_act': 'gelu', 'expansion': 0.5, 'depth_mult': 0.34, 'act': 'silu'}, 'DFINETransformer': {'feat_channels': [256, 256, 256], 'feat_strides': [8, 16, 32], 'hidden_dim': 256, 'num_levels': 3, 'num_layers': 3, 'eval_idx': -1, 'num_queries': 300, 'num_denoising': 100, 'label_noise_ratio': 0.5, 'box_noise_scale': 1.0, 'reg_max': 32, 'reg_scale': 4, 'layer_scale': 1, 'num_points': [3, 6, 3], 'cross_attn_method': 'default', 'query_select_method': 'default', 'activation': 'silu', 'mlp_act': 'silu'}, 'PostProcessor': {'num_top_queries': 300}, 'DEIMCriterion': {'weight_dict': {'loss_vfl': 1, 'loss_bbox': 5, 'loss_giou': 2, 'loss_fgl': 0.15, 'loss_ddf': 1.5, 'loss_mal': 1}, 'losses': ['mal', 'boxes', 'local'], 'alpha': 0.75, 'gamma': 1.5, 'reg_max': 32, 'matcher': {'type': 'HungarianMatcher', 'weight_dict': {'cost_class': 2, 'cost_bbox': 5, 'cost_giou': 2}, 'alpha': 0.25, 'gamma': 2.0}}, '__include__': ['./dfine_hgnetv2_s_coco.yml', '../base/deim.yml'], 'config': 'configs/deim_dfine/deim_hgnetv2_s_coco.yml', 'tuning': '/content/deim_dfine_hgnetv2_s_coco_120e.pth', 'seed': 0, 'test_only': False, 'print_method': 'builtin', 'print_rank': 0}}
Tuning checkpoint from /content/deim_dfine_hgnetv2_s_coco_120e.pth
Load model.state_dict, {'missed': [], 'unmatched': []}
Initial lr: [0.0002, 0.0004, 0.0004]
building train_dataloader with batch_size=1...
Traceback (most recent call last):
File "/content/DEIM/train.py", line 84, in <module>
main(args)
File "/content/DEIM/train.py", line 54, in main
solver.fit()
File "/content/DEIM/engine/solver/det_solver.py", line 25, in fit
self.train()
File "/content/DEIM/engine/solver/_solver.py", line 87, in train
self.cfg.train_dataloader, shuffle=self.cfg.train_dataloader.shuffle
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/yaml_config.py", line 76, in train_dataloader
self._train_dataloader = self.build_dataloader('train_dataloader')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/yaml_config.py", line 172, in build_dataloader
loader = create(name, global_cfg, batch_size=bs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/DEIM/engine/core/workspace.py", line 121, in create
module = getattr(cfg['_pymodule'], name)
~~~^^^^^^^^^^^^^
KeyError: '_pymodule'
My custom dataset is from Roboflow exported in COCO format. I restructured and renamed the files to comply with the stated file structure in README as well as MSCOCO2017 for my annotation json files they use the format above. I really want to use this model for a project but have to finetune it.
Hello there, I also encountered the same errors as you did. So I made a wrapper package on top of DEIM to make training easier. I've trained on a couple of custom dataset and gotten some results.
You can get started by a one liner installation. It should get the proper torch, torchvision, cuda version. I've also tried running on Google Colab and works fine.
Checkout the repo - https://github.com/dnth/DEIMKit
Colab - https://colab.research.google.com/drive/1xxqAoOjgSsFDj7Jvj_jcU44mRBziTDF6?usp=sharing
Hi ! I encountered the same NotImplementedError issue and I figured out that I had the issue only when using torchvision with version above 0.21.
I managed to fix this issue by either downgrading torchvision or modifying engine/data/transforms/_transforms.py as described in the PR I submitted: https://github.com/ShihuaHuang95/DEIM/pull/47.
Hi ! I encountered the same
NotImplementedErrorissue and I figured out that I had the issue only when using torchvision with version above 0.21.I managed to fix this issue by either downgrading torchvision or modifying
engine/data/transforms/_transforms.pyas described in the PR I submitted: #47.
Thank you for the insight. I downgraded the torchvision version to below 0.21, and the error has been resolved. I hope the PR gets merged soon.
I hit this same issue, I hope the PR is accepted soon! I downgraded torchvision.
Hello there, I also encountered the same errors as you did. So I made a wrapper package on top of DEIM to make training easier. I've trained on a couple of custom dataset and gotten some results.
You can get started by a one liner installation. It should get the proper torch, torchvision, cuda version. I've also tried running on Google Colab and works fine.
Checkout the repo - https://github.com/dnth/DEIMKit
Colab - https://colab.research.google.com/drive/1xxqAoOjgSsFDj7Jvj_jcU44mRBziTDF6?usp=sharing
Hi, this no longer works due to the python of colab now being >3.12 , and trying to go the long way with venv doesn't work either - do you know if there's any way to downgrade it? thank you