PU-Net icon indicating copy to clipboard operation
PU-Net copied to clipboard

[Bug] EMD loss cannot handle input size less than 4096 either

Open Lotayou opened this issue 6 years ago • 2 comments

@yulequan I just found out that EMD loss module would crash too even if the input size is smaller than 4096.

Here are the error message:

Warning: Input parameter 2048 has been switched to 1722 for dyna_patch dataset...
vcl-dl-3
Namespace(batch_size=1, dataset='dyna_patch', gpu='0', learning_rate=0.001, log_dir='../model/debug', max_epoch=120, num_point=2048, phase='train', test_dir='../data/test_data/our_collected_data/MC_5k', up_ratio=2)
Traceback (most recent call last):
  File "main.py", line 277, in <module>
    assert not os.path.exists(os.path.join(MODEL_DIR, 'code/'))
AssertionError
(yanglingbo) ylb@vcl-dl-3:~/projects/3D_mesh_SR/PU-Net/code$ sh train_dyna_patch.sh
Warning: Input parameter 2048 has been switched to 1722 for dyna_patch dataset...
vcl-dl-3
Namespace(batch_size=1, dataset='dyna_patch', gpu='0', learning_rate=0.001, log_dir='../model/debug', max_epoch=120, num_point=2048, phase='train', test_dir='../data/test_data/our_collected_data/MC_5k', up_ratio=2)
2019-01-22 15:43:58.696137: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-22 15:44:00.175833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-01-22 15:44:00.176297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
use randominput, input h5 file is: ../h5_data/dyna_patch_dataset_pu_net.h5
Normalization the data
total 10220 samples
NUM_BATCH is 10220
True True
**** EPOCH 000 ****
2019-01-22 15:44:24.039949: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2019-01-22 15:44:24.040244: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Aborted (core dumped)

Data Processing My original mesh contains 6890 points, to cope with the EMD size contraint I split each human in left and right halves, with 3444 points, and I choose downsample ratio r=2, so the downsampled input contains 1722 points, and the output should also contain 3444 points. However, the error still happens just as when my input is over 4096 points. In the meantime, training the author provided 4096-point dataset works without problem.

Configuration

CUDA 9.0
CUDNN 7005
Python 3.6
Tensorflow 1.5.1

Also in #3 .

Lotayou avatar Jan 22 '19 07:01 Lotayou

have you solve this problem???

MrXiaoZhen avatar Apr 29 '19 06:04 MrXiaoZhen

@Lotayou

MrXiaoZhen avatar Apr 29 '19 06:04 MrXiaoZhen