Twins icon indicating copy to clipboard operation
Twins copied to clipboard

Can we train or test on single GPU in detection sections?

Open nestor0003 opened this issue 3 years ago • 1 comments

If we want to test detection task, or just use the shell code like 'bash dist_test.sh configs/retinanet_alt_gvt_s_fpn_1x_coco_pvt_setting.py checkpoint_file 1 --eval mAP' ?

Or change the lr? and the number of the worker ? I'm a beginner of the mmdet framework, please help... this is the error lines:

/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from os.environ('LOCAL_RANK') instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : ./test.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_o5bp99y9/none_u2fqutod INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_0/0/error.json loading annotations into memory... Done (t=0.52s) creating index... index created! Traceback (most recent call last): File "./test.py", line 213, in main() File "./test.py", line 166, in main model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 67, in build_detector return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build return build_from_cfg(cfg, registry, default_args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(**args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/retinanet.py", line 16, in init super(RetinaNet, self).init(backbone, neck, bbox_head, train_cfg, File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 25, in init self.backbone = build_backbone(backbone) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 37, in build_backbone return build(cfg, BACKBONES) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build return build_from_cfg(cfg, registry, default_args) File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(**args) File "/home/user/project/Twins/detection/gvt.py", line 482, in init super(alt_gvt_small, self).init( File "/home/user/project/Twins/detection/gvt.py", line 419, in init super(ALTGVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, File "/home/user/project/Twins/detection/gvt.py", line 408, in init super(PCPVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, File "/home/user/project/Twins/detection/gvt.py", line 343, in init super(CPVTV2, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, mlp_ratios, File "/home/user/project/Twins/detection/gvt.py", line 234, in init _block = nn.ModuleList([block_cls( File "/home/user/project/Twins/detection/gvt.py", line 234, in _block = nn.ModuleList([block_cls( File "/home/user/project/Twins/detection/gvt.py", line 164, in init super(GroupBlock, self).init(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop, TypeError: init() takes from 3 to 10 positional arguments but 11 were given ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11449) of binary: /home/user/miniconda3/envs/twins/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_1/0/error.json

nestor0003 avatar Nov 19 '21 08:11 nestor0003

We suggest using at least 4 GPUs to train.

We still have some intern offers. If you are interested, please send your CV to [email protected]. Thanks

cxxgtxy avatar Nov 19 '21 11:11 cxxgtxy