inference
inference copied to clipboard
BERT on TensorFlow fails
On r2.1, the Docker container run fails as shown:
(mlperf) $ python3 run.py --backend=tf --scenario=Offline
.
.
.
Running LoadGen test...
2022-11-02 14:19:55.493183: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2022-11-02 14:30:24.704249: E tensorflow/stream_executor/cuda/cuda_blas.cc:440] failed to run cuBLAS routine: CUBLAS_STATUS_NOT_SUPPORTED
2022-11-02 14:30:24.704299: E tensorflow/stream_executor/cuda/cuda_blas.cc:2453] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
[[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
[[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
[[logits/_11]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 120, in <module>
main()
File "run.py", line 102, in main
lg.StartTestWithLogSettings(sut.sut, sut.qsl.qsl, settings, log_settings)
File "/workspace/tf_SUT.py", line 64, in issue_queries
result = self.sess.run(["logits:0"], feed_dict=feeds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
[[node bert/encoder/layer_0/attention/self/MatMul (defined at /workspace/tf_SUT.py:45) ]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
[[node bert/encoder/layer_0/attention/self/MatMul (defined at /workspace/tf_SUT.py:45) ]]
[[logits/_11]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'bert/encoder/layer_0/attention/self/MatMul':
File "run.py", line 120, in <module>
main()
File "run.py", line 68, in main
sut = get_tf_sut(args)
File "/workspace/tf_SUT.py", line 79, in get_tf_sut
return BERT_TF_SUT(args)
File "/workspace/tf_SUT.py", line 45, in __init__
tf.import_graph_def(graph_def, name='')
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 443, in import_graph_def
_ProcessNewOps(graph)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 236, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3751, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3751, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3641, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
Segmentation fault (core dumped)
Looking into this, I suspected an out of memory condition on my GPU, but I'm using an NVIDIA A30 with 24GB of memory. I would think that's plenty enough. In case it's helpful, I'm running on Ubuntu 20.04 with NVIDIA driver version 520.61.05 and CUDA version 11.8.