LISA icon indicating copy to clipboard operation
LISA copied to clipboard

cuda out of memory

Open ZhilingYan opened this issue 1 year ago • 0 comments

Hi,

I'm trying to run the 7B model's validation code with 8*NVIDIA RTX A5000. However, an out-of-memory error occurred. I'm wondering if it needs so much to test.

Here is the log:

[2024-01-17 17:44:34,524] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2024-01-17 17:44:34,597] [INFO] [runner.py:360:main] cmd = /home/zhiling/anaconda3/envs/python310/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMywgNCwgNiwgNywgOCwgOV19 --master_addr=127.0.0.1 --master_port=24999 train_ds.py --version=xinlai/LISA-7B-v1 --dataset_dir=/data/zhiling/Dataset/LISA/dataset --vision_pretrained=/home/zhiling/LISA_old/checkpoints/sam/sam_vit_h_4b8939.pth --exp_name=lisa-7b --precision=fp16 --eval_only [2024-01-17 17:44:35,956] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1, 3, 4, 6, 7, 8, 9]} [2024-01-17 17:44:35,956] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=8, node_rank=0 [2024-01-17 17:44:35,956] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}) [2024-01-17 17:44:35,956] [INFO] [launch.py:102:main] dist_world_size=8 [2024-01-17 17:44:35,956] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1,3,4,6,7,8,9

ZhilingYan avatar Jan 17 '24 22:01 ZhilingYan