bert4keras
bert4keras copied to clipboard
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.
提问时请尽可能提供如下信息:
基本信息
- 你使用的操作系统: windows
- 你使用的Python版本: 3.6.10
- 你使用的Tensorflow版本: 2.3.1
- 你使用的bert4keras版本:latest
- 你使用纯keras还是tf.keras: tf.keras
核心测试代码
import os
os.environ['TF_KERAS'] = '1'
os.environ['RECOMPUTE'] = '1'
import numpy as np
import tensorflow as tf
assert tf.__version__ >= '2'
from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)
from bert4keras.models import Model, Dense, Input, BiasAdd, Embedding, LayerNormalization
from bert4keras.snippets import DataGenerator, sequence_padding
class data_generator(DataGenerator):
def __iter__(self, random=True):
batch_token_ids, batch_labels = [], []
while 1:
seq_len = 32
token_ids = [128 for i in range(seq_len)]
label = [np.random.randint(2)]
batch_token_ids.append(token_ids)
batch_labels.append(label)
if len(batch_token_ids) == self.batch_size:
batch_token_ids = sequence_padding(batch_token_ids)
batch_labels = np.array(batch_labels)
yield batch_token_ids, batch_labels
batch_token_ids, batch_labels = [], []
input = Input(shape=(32,))
output = Embedding(input_dim=256, output_dim=32, mask_zero=True)(input)
output = BiasAdd()(input)
output = LayerNormalization()(output)
output = Dense(1, activation='tanh')(output)
model = Model(input, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit_generator(data_generator([], batch_size=32).forfit(),
steps_per_epoch=100, epochs=3)
输出信息
TypeError: in user code:
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:806 train_function *
return step_function(self, iterator)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2945 _call_for_each_replica
return fn(*args, **kwargs)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:789 run_step **
outputs = model.train_step(data)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:757 train_step
self.trainable_variables)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:2722 _minimize
gradients = tape.gradient(loss, trainable_variables)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\eager\backprop.py:1073 gradient
unconnected_gradients=unconnected_gradients)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\eager\imperative_grad.py:77 imperative_grad
compat.as_str(unconnected_gradients.value))
G:\Anaconda3\envs\py36tf2\lib\site-packages\bert4keras\backend.py:308 actual_grad_fn
grads = grad_fn(*doutputs, variables=self.trainable_weights)
G:\Anaconda3\envs\py36tf2\lib\site-packages\bert4keras\backend.py:294 grad_fn
outputs = kernel_call()
G:\Anaconda3\envs\py36tf2\lib\site-packages\bert4keras\backend.py:275 kernel_call
return call(self, inputs, **kwargs)
G:\Anaconda3\envs\py36tf2\lib\site-packages\bert4keras\layers.py:497 call
outputs = outputs / std * gamma
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\ops\math_ops.py:1140 binary_op_wrapper
raise e
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\ops\math_ops.py:1124 binary_op_wrapper
return func(x, y, name=name)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\ops\math_ops.py:1456 _mul_dispatch
return multiply(x, y, name=name)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\util\dispatch.py:201 wrapper
return target(*args, **kwargs)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\ops\math_ops.py:508 multiply
return gen_math_ops.mul(x, y, name)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\ops\gen_math_ops.py:6176 mul
"Mul", x=x, y=y, name=name)
G:\Anaconda3\envs\py36tf2\lib\site-packages\tensorflow\python\framework\op_def_library.py:506 _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.
自我尝试
将源码中的
output = BiasAdd()(input)
output = LayerNormalization()(output)
进行顺序调换
output = LayerNormalization()(output)
output = BiasAdd()(input)
就不会报错
确认以下操作会出bug: TF_KERAS==1 & RECOMPUTE==1 & LayerNormalization层放在靠后面
- 不启用重计算就不会出问题;
- 把LN层往前放也不会出问题。。。
- 其他类似LN层在build方法中调用self.add_weights的层也没有出问题。。。例如BiasAdd层
解决方案 参考 https://github.com/tensorflow/tensorflow/commit/f048e532985895311891aa5521239303d28a9ce0 将 bert4keras/layers.py:497 & bert4keras/layers.py:499 对gamma & beta 进行dtype 转换 K.cast(gamma, K.dtype(std)) & K.cast(beta, K.dtype(output))
这是启用了混合精度训练的结果吗?我没有使用过混合精度,暂时没法提供参考意见。
(看来这方面是落后了,得想办法跟上了~)