bert4keras
bert4keras copied to clipboard
run pretraining.py 报错,不知道啥问题导致
提问时请尽可能提供如下信息:
基本信息
- 你使用的操作系统: linux
- 你使用的Python版本: python3.6.0
- 你使用的Tensorflow版本: 1.15.1
- 你使用的Keras版本: 2.3.1
- 你使用的bert4keras版本: 0.10.0
- 你使用纯keras还是tf.keras:
- 你加载的预训练模型: chinese_L-12_H-768_A-12
核心代码
# 代码是只有data_utils.py 和 pretraining.py 这两个文件,几乎没有啥改动,除了数据集
但是跑run pretraining.py的时候报错,不知道是不是版本问题导致,希望大佬告知一下呢
输出信息
2021-05-29 17:39:22.321821: W tensorflow/core/grappler/utils/graph_view.cc:830] No registered 'MultiDeviceIteratorFromStringHandle' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorFromStringHandle}}
. Registered: device='CPU'
2021-05-29 17:39:22.322607: W tensorflow/core/grappler/utils/graph_view.cc:830] No registered 'MultiDeviceIteratorGetNextFromShard' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorGetNextFromShard}}
. Registered: device='CPU'
Traceback (most recent call last):
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: buffer_size must be greater than zero.
[[{{node ShuffleDataset_1}}]]
(1) Invalid argument: buffer_size must be greater than zero.
[[{{node ShuffleDataset_1}}]]
[[MultiDeviceIteratorInit/_2057]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pretraining.py", line 325, in <module>
callbacks=[checkpoint, csv_logger],
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 685, in fit
steps_name='steps_per_epoch')
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 144, in model_iteration
input_iterator = _get_iterator(inputs, model._distribution_strategy)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 550, in _get_iterator
inputs, distribution_strategy)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py", line 588, in get_iterator
initialize_iterator(iterator, distribution_strategy)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py", line 596, in initialize_iterator
K.get_session((init_op,)).run(init_op)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: buffer_size must be greater than zero.
[[node ShuffleDataset_1 (defined at /mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Invalid argument: buffer_size must be greater than zero.
[[node ShuffleDataset_1 (defined at /mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[MultiDeviceIteratorInit/_2057]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ShuffleDataset_1':
File "pretraining.py", line 325, in <module>
callbacks=[checkpoint, csv_logger],
File "/mnt/data//Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 685, in fit
steps_name='steps_per_epoch')
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 144, in model_iteration
input_iterator = _get_iterator(inputs, model._distribution_strategy)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 550, in _get_iterator
inputs, distribution_strategy)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py", line 587, in get_iterator
iterator = distribution_strategy.make_dataset_iterator(dataset)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1008, in make_dataset_iterator
return self._extended._make_dataset_iterator(dataset) # pylint: disable=protected-access
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 537, in _make_dataset_iterator
split_batch_by=self._num_replicas_in_sync)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 767, in __init__
input_context=input_context)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 563, in __init__
input_context=input_context)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 521, in __init__
cloned_dataset = input_ops._clone_dataset(dataset) # pylint: disable=protected-access
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_ops.py", line 57, in _clone_dataset
remap_dict = _clone_helper(dataset._variant_tensor.op, variant_tensor_ops)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_ops.py", line 81, in _clone_helper
recursive_map = _clone_helper(input_tensor_op, variant_tensor_ops)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_ops.py", line 81, in _clone_helper
recursive_map = _clone_helper(input_tensor_op, variant_tensor_ops)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_ops.py", line 81, in _clone_helper
recursive_map = _clone_helper(input_tensor_op, variant_tensor_ops)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_ops.py", line 97, in _clone_helper
op_def=_get_op_def(op_to_clone))
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/mnt/data/faker/Relation_Extraction/venv_dir/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
自我尝试
不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。 切换过很多版本,但是没有试用正确,头皮发麻,望大佬指导一下
就是数据问题,显示没数据。。。
就是数据问题,显示没数据。。。
苏老师,看了代码发现了我参数设置的问题; batch-size 比 grad_accum_steps的值小,导致后面的数据一直为0;
还有个疑问,看了您的batch-size设置为4096,这么大的batch-size,是用TPU来训练的吗?如果是24G的GPU显卡,那么相对应得batch-size 和 grad_accum_steps 这两个值都设为 8(这样设置可以?)