AdaContrast
AdaContrast copied to clipboard
CUDA out of memory and Waiting for W&B process to finish... (failed 1)
I set up a environment following the instruction, but it seems to encounter a problem.
(Adatta) root@053d4d94eab6:/code/AdaContrast# bash train_VISDA-C_target.sh /code/AdaContrast/
main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/opt/conda/envs/Adatta/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'root': Defaults list is missing _self_
. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/opt/conda/envs/Adatta/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator())
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator())
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator())
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator())
[INFO] 2023-10-04 06:33:10 main_adacontrast.py:97 Dataset: VISDA-C, Source domains: ['train'], Target domains: ['validation'], Pipeline: target
wandb: Currently logged in as: wenbo-zhang01 (wenbozhang). Use wandb login --relogin
to force relogin
wandb: Tracking run with wandb version 0.15.11
wandb: Run data is saved locally in /code/AdaContrast/output/VISDA-C/target/wandb/run-20231004_063312-6t2re0hg
wandb: Run wandb offline
to turn off syncing.
wandb: Syncing run seed_2020
wandb: ⭐️ View project at https://wandb.ai/wenbozhang/VISDA-C
wandb: 🚀 View run at https://wandb.ai/wenbozhang/VISDA-C/runs/6t2re0hg
[INFO] 2023-10-04 06:33:17 target.py:241 Start target training on train-validation...
[INFO] 2023-10-04 06:33:19 classifier.py:54 Loaded from /code/AdaContrast/best_train_2020.pth.tar; missing params: []
[INFO] 2023-10-04 06:33:21 classifier.py:54 Loaded from /code/AdaContrast/best_train_2020.pth.tar; missing params: []
[INFO] 2023-10-04 06:33:21 target.py:271 1 - Created target model
[INFO] 2023-10-04 06:33:21 target.py:44 Eval and labeling...
0%| | 0/55 [00:00<?, ?it/s]/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:44<00:00, 1.23it/s]
[INFO] 2023-10-04 06:34:08 target.py:79 Accuracy of direct prediction: 45.41
[INFO] 2023-10-04 06:34:08 utils.py:291 Accuracy per class: [45.72 8.95 37.4 63.31 48.8 2.75 82.64 18.6 48.71 25.43 88.69 7.26], mean: 39.85
[INFO] 2023-10-04 06:34:16 utils.py:291 Accuracy per class: [50.05 6.36 40.26 65.78 51.93 0.39 85.49 16.68 46.36 21.44 92.07 5.25], mean: 40.17
[INFO] 2023-10-04 06:34:18 target.py:114 Collected 55388 pseudo labels.
[INFO] 2023-10-04 06:34:18 target.py:289 2 - Computed initial pseudo labels
[INFO] 2023-10-04 06:34:18 target.py:311 3 - Created train/val loader
[INFO] 2023-10-04 06:34:18 target.py:315 4 - Created optimizer
[INFO] 2023-10-04 06:34:18 target.py:317 Start training...
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
/code/AdaContrast/main_adacontrast.py:24: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="configs", config_name="root")
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: / 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb: Test Acc ▁
wandb: Test Avg ▁
wandb: Test Post Acc ▁
wandb: Test Post Avg ▁
wandb:
wandb: Run summary:
wandb: Test Acc 45.41236
wandb: Test Avg 39.855
wandb: Test Post Acc 46.27536
wandb: Test Post Avg 40.17167
wandb:
wandb: 🚀 View run seed_2020 at: https://wandb.ai/wenbozhang/VISDA-C/runs/6t2re0hg
wandb: ️⚡ View job at https://wandb.ai/wenbozhang/VISDA-C/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwMzkzMTk1OQ==/version_details/v2
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20231004_063312-6t2re0hg/logs
Error executing job with overrides: ['seed=2020', 'port=10001', 'memo=target', 'project=VISDA-C', 'data.data_root=/code/AdaContrast/datasets', 'data.workers=8', 'data.dataset=VISDA-C', 'data.source_domains=[train]', 'data.target_domains=[validation]', 'model_src.arch=resnet101', 'model_tta.src_log_dir=/code/AdaContrast/', 'optim.lr=2e-4']
Traceback (most recent call last):
File "main_adacontrast.py", line 42, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/code/AdaContrast/main_adacontrast.py", line 141, in main_worker train_target_adacontrast(args) File "/code/AdaContrast/target.py", line 323, in train_target_domain train_epoch(train_loader, model, banks, optimizer, epoch, args) File "/code/AdaContrast/target.py", line 371, in train_epoch _, logits_q, logits_ins, keys = model(images_q, images_k) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/code/AdaContrast/moco/builder.py", line 173, in forward k, _ = self.momentum_model(im_k, return_feats=True) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/code/AdaContrast/classifier.py", line 37, in forward feat = self.encoder(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 220, in forward return self._forward_impl(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 208, in _forward_impl x = self.layer1(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 116, in forward identity = self.downsample(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 532, in forward return sync_batch_norm.apply( File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/_functions.py", line 52, in forward out = torch.batch_norm_elemt(input, weight, bias, mean, invstd, eps) RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 10.91 GiB total capacity; 8.39 GiB already allocated; 21.19 MiB free; 8.76 GiB reserved in total by PyTorch)
it runs on 4*1080Ti. However, it outputs the same error when moved to 4090. What would be the reason probably? Thanks~
Hi, thank you for your interest in our work!
We trained our model on 8 16GB GPUs; it looks like your per-GPU batch size is too large to fit.
Can you try gradient accumulation, or multi-node training?