SAM-Adapter-PyTorch
SAM-Adapter-PyTorch copied to clipboard
train not success
on 8*RTX 3090 cant train! this is my train script : CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m torch.distributed.launch --master_port=12000 --nnodes 1 --nproc_per_node 4 train.py --config /home/quchunguang/003-large-model/SAM-Adapter-PyTorch/configs/cod-sam-vit-h.yaml --tag exp1
this is train logs
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn( WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/quchunguang/anaconda3/envs/SAM-Adapter/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn(
and always ........
not any next train output context .................
how can deal with this question?
Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.
Greetings! As the current application will utilize over 30G of memory for batchsize=1, we suggest considering alternative graphics cards with greater memory capacity.
Thank you for your reply. I am using 8 * 3090Nvidia and the computer memory is 188Gb. There was no log output during the training process. The graphics card didn't respond either