deep-high-resolution-net.pytorch
deep-high-resolution-net.pytorch copied to clipboard
RuntimeError: CUDA error: out of memory -- although GPU is empty
Only reason I can think of is my cuda version is 11.7, but the latest version of PyTorch available is for cuda 11.6. Could that be the issue?
Log:
python tools/test.py \
--cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pytorch/pose_hrnet_w48_384x288.pth \
TEST.USE_GT_BBOX False
=> creating log/coco/pose_hrnet/w48_384x288_adam_lr1e-3_2022-07-19-07-56
Namespace(cfg='experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml', opts=['TEST.MODEL_FILE', 'models/pytorch/pose_hrnet_w48_384x288.pth', 'TEST.USE_GT_BBOX', 'False'], modelDir='', logDir='', dataDir='', prevModelDir='')
AUTO_RESUME: True
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
COLOR_RGB: True
DATASET: coco
DATA_FORMAT: jpg
FLIP: True
HYBRID_JOINTS_TYPE:
NUM_JOINTS_HALF_BODY: 8
PROB_HALF_BODY: 0.3
ROOT: data/coco/
ROT_FACTOR: 45
SCALE_FACTOR: 0.35
SELECT_DATA: False
TEST_SET: val2017
TRAIN_SET: train2017
DATA_DIR:
DEBUG:
DEBUG: True
SAVE_BATCH_IMAGES_GT: True
SAVE_BATCH_IMAGES_PRED: True
SAVE_HEATMAPS_GT: True
SAVE_HEATMAPS_PRED: True
GPUS: (0, 1, 2, 3)
LOG_DIR: log
LOSS:
TOPK: 8
USE_DIFFERENT_JOINTS_WEIGHT: False
USE_OHKM: False
USE_TARGET_WEIGHT: True
MODEL:
EXTRA:
FINAL_CONV_KERNEL: 1
PRETRAINED_LAYERS: ['conv1', 'bn1', 'conv2', 'bn2', 'layer1', 'transition1', 'stage2', 'transition2', 'stage3', 'transition3', 'stage4']
STAGE2:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4]
NUM_BRANCHES: 2
NUM_CHANNELS: [48, 96]
NUM_MODULES: 1
STAGE3:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4, 4]
NUM_BRANCHES: 3
NUM_CHANNELS: [48, 96, 192]
NUM_MODULES: 4
STAGE4:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4, 4, 4]
NUM_BRANCHES: 4
NUM_CHANNELS: [48, 96, 192, 384]
NUM_MODULES: 3
HEATMAP_SIZE: [72, 96]
IMAGE_SIZE: [288, 384]
INIT_WEIGHTS: True
NAME: pose_hrnet
NUM_JOINTS: 17
PRETRAINED: models/pytorch/imagenet/hrnet_w48-8ef0771d.pth
SIGMA: 3
TAG_PER_JOINT: True
TARGET_TYPE: gaussian
OUTPUT_DIR: output
PIN_MEMORY: True
PRINT_FREQ: 100
RANK: 0
TEST:
BATCH_SIZE_PER_GPU: 24
BBOX_THRE: 1.0
COCO_BBOX_FILE: data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json
FLIP_TEST: True
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: models/pytorch/pose_hrnet_w48_384x288.pth
NMS_THRE: 1.0
OKS_THRE: 0.9
POST_PROCESS: True
SHIFT_HEATMAP: True
SOFT_NMS: False
USE_GT_BBOX: False
TRAIN:
BATCH_SIZE_PER_GPU: 24
BEGIN_EPOCH: 0
CHECKPOINT:
END_EPOCH: 210
GAMMA1: 0.99
GAMMA2: 0.0
LR: 0.001
LR_FACTOR: 0.1
LR_STEP: [170, 200]
MOMENTUM: 0.9
NESTEROV: False
OPTIMIZER: adam
RESUME: False
SHUFFLE: True
WD: 0.0001
WORKERS: 24
=> loading model from models/pytorch/pose_hrnet_w48_384x288.pth
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
=> classes: ['__background__', 'person']
=> num_images: 5000
=> Total boxes: 104125
=> Total boxes after fliter low [email protected]: 104125
=> load 104125 samples
Traceback (most recent call last):
File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/test.py", line 130, in <module>
main()
File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/test.py", line 125, in main
validate(cfg, valid_loader, valid_dataset, model, criterion,
File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/../lib/core/function.py", line 118, in validate
for i, (input, target, target_weight, meta) in enumerate(val_loader):
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
data = self._next_data()
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in _pin_memory_loop
data = pin_memory(data, device)
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 65, in pin_memory
return type(data)([pin_memory(sample, device) for sample in data]) # type: ignore[call-arg]
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 65, in <listcomp>
return type(data)([pin_memory(sample, device) for sample in data]) # type: ignore[call-arg]
File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 50, in pin_memory
return data.pin_memory(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called without an active exception```
### --------------------------------
> nvidia-smi
```Tue Jul 19 07:54:49 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 516.59 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 On | N/A |
| 0% 49C P8 24W / 370W | 683MiB / 24576MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:0B:00.0 Off | N/A |
| 0% 39C P8 14W / 370W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+```