NiftyNet
NiftyNet copied to clipboard
problems to run niftynet
Hello, I am trying to test NiftyNet for the first time but I am unable to do it. I have configured the instalation according to this site (source code repository): https://niftynet.readthedocs.io/en/dev/installation.html I have sicessfuly downloaded the model, however, once I execute te command "python net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini" I get the follwing errors: .... -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Initialising Dataset from 1 subjects... 2019-10-01 13:53:56.311601: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.312103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725 pciBusID: 0000:01:00.0 2019-10-01 13:53:56.312156: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-10-01 13:53:56.312167: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-10-01 13:53:56.312177: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2019-10-01 13:53:56.312186: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2019-10-01 13:53:56.312195: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2019-10-01 13:53:56.312205: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2019-10-01 13:53:56.312215: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-01 13:53:56.312256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.312735: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.313199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-10-01 13:53:56.313229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-01 13:53:56.313233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-01 13:53:56.313240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-10-01 13:53:56.313345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.313816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.314271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6821 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Restoring parameters from /home/yunior/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000 2019-10-01 13:53:56.630423: W tensorflow/core/common_runtime/colocation_graph.cc:1016] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [ /job:localhost/replica:0/task:0/device:CPU:0]. See below for details of this colocation group: Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] IteratorGetNext: CPU GPU XLA_CPU XLA_GPU OneShotIterator: CPU IteratorToStringHandle: CPU GPU XLA_CPU XLA_GPU
Colocation members, user-requested devices, and framework assigned devices, if any: worker_0/validation/OneShotIterator (OneShotIterator) /device:GPU:0 worker_0/validation/IteratorToStringHandle (IteratorToStringHandle) /device:GPU:0 worker_0/validation/IteratorGetNext (IteratorGetNext) /device:GPU:0
2019-10-01 13:53:57.360882: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-01 13:53:57.991115: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:57.998596: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:58.001047: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:58.001075: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support INFO:niftynet: cleaning up... INFO:niftynet: stopping sampling threads ...... my configuration is as follows CPU conf. intel I7 (8 cores) and 64GB RAM GPU conf. GeForce RTX 2070, 8GB, 2304 cores
In addition I have installed the gpu-version of tensorflow to use de GPU por calculations I can imaging that errors are related to memory issues in the GPU. I wonder whether is there a way to use the memory on the CPU as well.
Could you please give me a feedback. Note I am not an expert using python
thanks in advance
As per Tensorflow issue #24496 it seems to be a tensorflow problem.
Could you please try and run this tensorflow example and let us know if the same error appears there.
Thank you very much for the reply. I have tried this and I got no errors at all. The script did the predictions and this are the final message: ... Test accuracy: 0.8813 (28, 28) (1, 28, 28) [[3.5098125e-04 1.3001217e-15 9.9916017e-01 4.8920496e-11 4.2484555e-04 5.2356322e-12 6.4001571e-05 5.9205704e-17 5.7315066e-11 2.8146843e-15]]
I have an additional comment that might help to figure out the problem with NiftyNet. I faced problems with tf at the beguining. The thing is that I have 1.14.0 version of tf and apparently NiftyNet have troubles with this version. As a simple solution the program suggested to use tf.compat.v1.Session in several subscripts of the software. Therefore I used:
import tensorflow.compat.v1 as tf tf.disable_v2_behavior()
instead of import tensorflow as tf
Then errors with tensorflow session were fixed Could it be the source of the current problem?
Thank you in advance
Hi, i did some progress, i think.
I have upgraded nvidia drivers and cuda toolkit. At leas I do not see the previous errors anymore.
Now I have nvidia-418, cuda-10.1 and tf 1.14.
However I have a new error (see below)
......
Traceback (most recent call last):
File "net_segment.py", line 5, in
File "/home/yunior/NiftyNet/niftynet/utilities/user_parameters_default.py", line 10, in
Please, could anybody suggest a tentative solution? Thanks
Hi guys, I really need NiftyNet running in my PC. However after more than a week I am not able to do it. Could somebody guiveme a feedback please? I have been trying to run the example posted here with no success. I have tried several configuration of nvidia drivers, cuda versions, cudnn and tensorflow but no progress at all. I currently have Ubuntu 18.04, Nvidia 4.18, cuda 10.0, cudnn 7.3.0. I see the following messages in the terminal when executed the program.
NiftyNet version 0.5.0+185.gb5f3ba1e.dirty
[CUSTOM]
-- num_classes: 9
-- output_prob: False
-- label_normalisation: False
-- softmax: True
-- min_sampling_ratio: 0
-- compulsory_labels: (0, 1)
-- rand_samples: 0
-- min_numb_labels: 1
-- proba_connect: True
-- evaluation_units: foreground
-- do_mixup: False
-- mixup_alpha: 0.2
-- mix_match: False
-- weight: ()
-- inferred: ()
-- sampler: ()
-- label: ('label',)
-- image: ('ct',)
-- name: net_segment
[CONFIG_FILE]
-- path: /home/yunior/niftynet/extensions/dense_vnet_abdominal_ct/config.ini
[CT]
-- csv_file:
-- path_to_search: ./data/dense_vnet_abdominal_ct/
-- filename_contains: ('CT',)
-- filename_not_contains: ()
-- filename_removefromid:
-- interp_order: 1
-- loader: None
-- pixdim: ()
-- axcodes: ('A', 'R', 'S')
-- spatial_window_size: (144, 144, 144)
[LABEL]
-- csv_file:
-- path_to_search: ./data/dense_vnet_abdominal_ct/
-- filename_contains: ('Label',)
-- filename_not_contains: ()
-- filename_removefromid:
-- interp_order: 0
-- loader: None
-- pixdim: ()
-- axcodes: ('A', 'R', 'S')
-- spatial_window_size: (144, 144, 144)
[SYSTEM]
-- cuda_devices: 0
-- num_threads: 1
-- num_gpus: 1
-- model_dir: /home/yunior/niftynet/models/dense_vnet_abdominal_ct
-- dataset_split_file: ./dataset_split.csv
-- event_handler: ('model_saver', 'model_restorer', 'sampler_threading', 'apply_gradients', 'output_interpreter', 'console_logger', 'tensorboard_logger', 'performance_logger')
-- iteration_generator: iteration_generator
-- queue_length: 36
-- action: inference
[NETWORK]
-- name: dense_vnet
-- activation_function: relu
-- batch_size: 1
-- smaller_final_batch_mode: pad
-- decay: 0.0
-- reg_type: L2
-- volume_padding_size: (0, 0, 0)
-- volume_padding_mode: minimum
-- volume_padding_to_size: (0,)
-- window_sampling: resize
-- force_output_identity_resizing: False
-- queue_length: 5
-- multimod_foreground_type: and
-- histogram_ref_file: ./histogram_ref_file.txt
-- norm_type: percentile
-- cutoff: (0.01, 0.99)
-- foreground_type: otsu_plus
-- normalisation: False
-- rgb_normalisation: False
-- whitening: False
-- normalise_foreground_only: False
-- weight_initializer: he_normal
-- bias_initializer: zeros
-- keep_prob: 1.0
-- weight_initializer_args: {}
-- bias_initializer_args: {}
[TRAINING]
-- optimiser: adam
-- sample_per_volume: 1
-- rotation_angle: ()
-- rotation_angle_x: ()
-- rotation_angle_y: ()
-- rotation_angle_z: ()
-- scaling_percentage: ()
-- isotropic_scaling: False
-- antialiasing: True
-- bias_field_range: ()
-- bf_order: 3
-- random_flipping_axes: -1
-- do_elastic_deformation: False
-- num_ctrl_points: 4
-- deformation_sigma: 15
-- proportion_to_deform: 0.5
-- lr: 0.001
-- loss_type: dense_vnet_abdominal_ct.dice_hinge.dice
-- starting_iter: 0
-- save_every_n: 1000
-- tensorboard_every_n: 20
-- max_iter: 3001
-- max_checkpoints: 100
-- validation_every_n: -1
-- validation_max_iter: 1
-- exclude_fraction_for_validation: 0.0
-- exclude_fraction_for_inference: 0.0
-- vars_to_restore:
-- vars_to_freeze:
-- patience: 100
-- early_stopping_mode: mean
[INFERENCE]
-- spatial_window_size: (144, 144, 144)
-- inference_iter: 3000
-- dataset_to_infer:
-- save_seg_dir: ./segmentation_output/
-- output_postfix: _niftynet_out
-- output_interp_order: 0
-- border: (0, 0, 0)
-- fill_constant: 0.0
INFO:niftynet: set CUDA_VISIBLE_DEVICES to 0
INFO:niftynet: starting segmentation application
INFO:niftynet: csv_file =
not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv" instead.
INFO:niftynet: [ct] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv
INFO:niftynet: csv_file =
not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv" instead.
INFO:niftynet: [label] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv
INFO:niftynet:
Number of subjects 1, input section names: ['subject_id', 'ct', 'label'] -- using all subjects (without data partitioning).
INFO:niftynet: Image reader: loading 1 subjects from sections ('ct',) as input [image]
2019-10-09 13:38:12.946391: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-09 13:38:12.949613: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-10-09 13:38:12.949963: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fa8e60d9d0 executing computations on platform Host. Devices:
2019-10-09 13:38:12.949975: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "net_segment.py", line 8, in
Caused by op 'worker_0/DenseVNet/conv_bn/conv_/conv', defined at:
File "net_segment.py", line 8, in
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node worker_0/DenseVNet/conv_bn/conv_/conv (defined at /home/yunior/NiftyNet/niftynet/layer/convolution.py:100) ]] [[node worker_0/post_processing/ExpandDims (defined at /home/yunior/NiftyNet/niftynet/layer/post_processing.py:36) ]]
I would like to add that when execute the program with no gpu compatibility, te software works but slowly.
Thank you in advance
I have never encountered your problem. Also, it seems to be Tensorflow & CUDA related more then NiftyNet related, which is also referenced by the fact that it works on CPU but not on GPU.
Could you please modify the following line in util_common.py:
def tf_config():
"""
tensorflow system configurations
"""
config = tf.ConfigProto()
config.log_device_placement = False
config.allow_soft_placement = True
return config
with
def tf_config():
"""
tensorflow system configurations
"""
config = tf.ConfigProto()
config.log_device_placement = False
config.allow_soft_placement = True
config.gpu_options.allow_growth = True
return config
i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16 using geforce RTX 2060 6GB vram with nvidia driver 440.33.01 tensorflow tries to allocate 5 GB spatial_window_size = (64, 64, 512) with dense_vnet network
i've tried config.gpu_options.allow_growth = True but it doesn't seem to work. I get the same "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
any solution so far? I am not sure if legacy drivers will work better, maybe the v390 nvidia driver is compatible? I wonder if this memcpy and CUDNN internal error is related to the newer drivers/cards I bought a GTX 1080 Ti w/ 11GB ram, will see if this one supports niftynet
Hello, I am not an expert in Python programming and therefore I don't know the pretty way to do it. As in your case I also tried to use "config.gpu_options.allow_growth = True" but for whatever reason it did'n work for me neither. However, because it is not that problematic for me, I type the following command before running niftynet: export TF_FORCE_GPU_ALLOW_GROWTH=true This solved my problem Hope this help youPlease in case some one want to share the easy and permanet way to do it please share it. Best
En domingo, 8 de diciembre de 2019 23:49:55 CET, talmazov <[email protected]> escribió:
i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16 using geforce RTX 2060 6GB vram with nvidia driver 440.33.01 tensorflow tries to allocate 5 GB spatial_window_size = (64, 64, 512) with dense_vnet network
i've tried config.gpu_options.allow_growth = True but it doesn't seem to work. I get the same "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
any solution so far? I am not sure if legacy drivers will work better, maybe the v390 nvidia driver is compatible? I wonder if this memcpy and CUDNN internal error is related to the newer drivers/cards I bought a GTX 1080 Ti w/ 11GB ram, will see if this one supports niftynet
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.