dlbench
dlbench copied to clipboard
Benchmarking Systems without GPU
Hi Team,
I am trying to benchmark a system without gpu. However, while running the benchmark script, it looks for nvidia-smi.
CalledProcessError: Command 'nvidia-smi' returned non-zero exit status 127
This is the same error that I get with fcn5, alexnet, resnet,lstm.
In addition, we plan to run the benchmarking on mxnet, tensorflow and caffe. So from the documentation, I understand that we need to copy the zip files to $HOME/data. However, we need to use the configuration file that is associated with the particular framework for it to work. Is that correct?
Hi, without GPU, please remove line 116 and line 117 , which are used for GPU power collection, in the file of benchmark.py
.
It is correct of your understanding that you need to write your own configuration file for your benchmarks. For data preparation, you also need to unzip the data file that you downloaded, and put them in the directory of $HOME/data.
Yup, now the errors related to gpus are gone. However, I get the following error message.
[bt] (9) /N/dc2/scratch/jerkatta/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorSimpleBind+0x2069) [0x7fe9f010d609]
Traceback (most recent call last):
File "train_cifar10.py", line 54, in <module>
fit.fit(args, sym, data.get_rec_iter, init)
File "/gpfs/home/j/e/jerkatta/Carbonate/benchmarking/dlbench/tools/mxnet/common/fit.py", line 187, in fit
monitor = monitor)
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/base_module.py", line 460, in fit
for_training=True, force_rebind=force_rebind)
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/module.py", line 417, in bind
state_names=self._state_names)
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 231, in __init__
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
shared_group))
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/symbol.py", line 1479, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (128, 3L, 32L, 32L)
softmax_label: (128,)
[12:31:58] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage
In addition, the current log files generated are,
./mxnet-cnn-alexnet--devId0,1,2,3-c4-b256-Tue_Jan_16_12:31:50_2018-xx.log:Total time: 1.67378902435
./mxnet-cnn-alexnet--devId0-c1-b1024-Tue_Jan_16_12:31:55_2018-e1.xx.log:Total time: 1.23220992088
./mxnet-cnn-resnet--devId0,1,2,3-c4-b128-Tue_Jan_16_12:31:52_2018-xx.log:Total time: 1.2443048954
./mxnet-cnn-resnet--devId0-c1-b128-Tue_Jan_16_12:31:56_2018-xx.log:Total time: 1.14851498604
./mxnet-fc-fcn5--devId0,1,2,3-c4-b1024-Tue_Jan_16_12:31:47_2018-xx.log:Total time: 2.8816781044
./mxnet-fc-fcn5--devId0-c1-b4096-Tue_Jan_16_12:31:54_2018-xx.log:Total time: 1.44347310066
./mxnet-rnn-lstm--devId0-c1-b1024-Tue_Jan_16_12:31:58_2018-xx.log:Total time: 1.68027210236
and we are running it without GPUs
You can narrow down the cause of the problem by testing MXNet only. Direct to dlbench/tools/mxnet and run testbm.sh. You may need to modify the script and comment out the lines for GPU tests. You can also append the flag -debug to each test line so that more info will be given to help you debug.
Okay, shall try the same. Some of the testbm.sh do not have the test statements for CPUs. I am assuming that if I pass -cpuCount 20 instead of -gpuCount 1, that would solve the problem