human-pose-estimation.pytorch
human-pose-estimation.pytorch copied to clipboard
the error about tensorboard is happen when i run the train.py
here is log of error:
/home/boyun/anaconda3/envs/tc0.40/bin/python /home/boyun/PycharmProjects/Humanpose/MultiPose/MSRA_BASELINE/human-pose-estimation.pytorch/pose_estimation/train.py
=> creating output/coco/pose_resnet_101/384x288_d256x3_adam_lr1e-3
Namespace(cfg='experiments/coco/resnet101/384x288_d256x3_adam_lr1e-3.yaml', frequent=5, gpus=None, workers=None)
=> creating log/coco/pose_resnet_101/384x288_d256x3_adam_lr1e-3_2019-01-24-14-54
{'CUDNN': {'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True},
'DATASET': {'DATASET': 'coco',
'DATA_FORMAT': 'jpg',
'FLIP': True,
'HYBRID_JOINTS_TYPE': '',
'ROOT': 'data/coco/',
'ROT_FACTOR': 40,
'SCALE_FACTOR': 0.3,
'SELECT_DATA': False,
'TEST_SET': 'val2017',
'TRAIN_SET': 'train2017'},
'DATA_DIR': '',
'DEBUG': {'DEBUG': True,
'SAVE_BATCH_IMAGES_GT': True,
'SAVE_BATCH_IMAGES_PRED': True,
'SAVE_HEATMAPS_GT': True,
'SAVE_HEATMAPS_PRED': True},
'GPUS': '0',
'LOG_DIR': 'log',
'LOSS': {'USE_TARGET_WEIGHT': True},
'MODEL': {'EXTRA': {'DECONV_WITH_BIAS': False,
'FINAL_CONV_KERNEL': 1,
'HEATMAP_SIZE': array([72, 96]),
'NUM_DECONV_FILTERS': [256, 256, 256],
'NUM_DECONV_KERNELS': [4, 4, 4],
'NUM_DECONV_LAYERS': 3,
'NUM_LAYERS': 101,
'SIGMA': 3,
'TARGET_TYPE': 'gaussian'},
'IMAGE_SIZE': array([288, 384]),
'INIT_WEIGHTS': True,
'NAME': 'pose_resnet',
'NUM_JOINTS': 17,
'PRETRAINED': 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth',
'STYLE': 'pytorch'},
'OUTPUT_DIR': 'output',
'PRINT_FREQ': 5,
'TEST': {'BATCH_SIZE': 1,
'BBOX_THRE': 1.0,
'COCO_BBOX_FILE': 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json',
'FLIP_TEST': False,
'IMAGE_THRE': 0.0,
'IN_VIS_THRE': 0.2,
'MODEL_FILE': '',
'NMS_THRE': 1.0,
'OKS_THRE': 0.9,
'POST_PROCESS': True,
'SHIFT_HEATMAP': True,
'USE_GT_BBOX': True},
'TRAIN': {'BATCH_SIZE': 6,
'BEGIN_EPOCH': 0,
'CHECKPOINT': '',
'END_EPOCH': 140,
'GAMMA1': 0.99,
'GAMMA2': 0.0,
'LR': 0.001,
'LR_FACTOR': 0.1,
'LR_STEP': [90, 120],
'MOMENTUM': 0.9,
'NESTEROV': False,
'OPTIMIZER': 'adam',
'RESUME': False,
'SHUFFLE': True,
'WD': 0.0001},
'WORKERS': 4}
=> init deconv weights from normal distribution
=> init 0.weight as normal(0, 0.001)
=> init 0.bias as 0
=> init 1.weight as 1
=> init 1.bias as 0
=> init 3.weight as normal(0, 0.001)
=> init 3.bias as 0
=> init 4.weight as 1
=> init 4.bias as 0
=> init 6.weight as normal(0, 0.001)
=> init 6.bias as 0
=> init 7.weight as 1
=> init 7.bias as 0
=> init final conv weights from normal distribution
=> init 8.weight as normal(0, 0.001)
=> init 8.bias as 0
=> loading pretrained model models/pytorch/imagenet/resnet101-5d3b4d8f.pth
Traceback (most recent call last):
File "/home/boyun/PycharmProjects/Humanpose/MultiPose/MSRA_BASELINE/human-pose-estimation.pytorch/pose_estimation/train.py", line 208, in
Process finished with exit code 1
when i remove writer_dict['writer'].add_graph(model, (dump_input, ), verbose=False)
,the train well run. so i guess that the tensorboardX maybe exist some bug. my config is "tensorboardX=1.6". but i find the requirement of this repo tensorboradx only need to bigger 1.2。 what should i do to debug?
Any suggestion will be appreciated.
@Will-Hui It works well when i remove writer_dict as you mentioned.
Just change adjust tensorboadX version, not the last version.
Method one : In pose_estimation/train.py
# NOTE 1.5=>tensorboardX>=1.2
# writer_dict['writer'].add_graph(model, (dump_input, ))
writer_dict['writer'].add_graph(model, (dump_input, ), verbose=False)
Method two: just rewrite tensorbaordX version as follow:
tensorboardX==1.5
@HuAndrew Thx.