About Multi-Backend Implementation of Gradient Checkpointing question
I tried to implement Multi-Backend Gradient Checkpointing in https://github.com/pass-lin/bert4keras3 But I encounter some problems, such as when I implement in the tf backend
class ScaleOffset(Layer):
def __init__(
self,
scale=True,
offset=True,
conditional=False,
hidden_units=None,
hidden_activation='linear',
hidden_initializer='glorot_uniform',
**kwargs
):
super(ScaleOffset, self).__init__(**kwargs)
self.scale = scale
self.offset = offset
self.conditional = conditional
self.hidden_units = hidden_units
self.hidden_activation = activations.get(hidden_activation)
self.hidden_initializer = initializers.get(hidden_initializer)
@integerize_shape
def build(self, input_shape):
super(ScaleOffset, self).build(input_shape)
if self.conditional:
input_shape = input_shape[0]
if self.offset is True:
self.beta = self.add_weight(
name='beta', shape=(input_shape[-1],), initializer='zeros'
)
if self.scale is True:
self.gamma = self.add_weight(
name='gamma', shape=(input_shape[-1],), initializer='ones'
)
if self.conditional:
if self.hidden_units is not None:
self.hidden_dense = Dense(
units=self.hidden_units,
activation=self.hidden_activation,
use_bias=False,
kernel_initializer=self.hidden_initializer
)
if self.offset is not False and self.offset is not None:
self.beta_dense = Dense(
units=input_shape[-1],
use_bias=False,
kernel_initializer='zeros'
)
if self.scale is not False and self.scale is not None:
self.gamma_dense = Dense(
units=input_shape[-1],
use_bias=False,
kernel_initializer='zeros'
)
def compute_mask(self, inputs, mask=None):
if self.conditional:
return mask if mask is None else mask[0]
else:
return mask
@tf.recompute_grad
def call(self, inputs):
"""如果带有条件,则默认以list为输入,第二个是条件
"""
if self.conditional:
inputs, conds = inputs
if self.hidden_units is not None:
conds = self.hidden_dense(conds)
conds = align(conds, [0, -1], ops.ndim(inputs))
if self.scale is not False and self.scale is not None:
gamma = self.gamma if self.scale is True else self.scale
if self.conditional:
gamma = gamma + self.gamma_dense(conds)
inputs = inputs * gamma
if self.offset is not False and self.offset is not None:
beta = self.beta if self.offset is True else self.offset
if self.conditional:
beta = beta + self.beta_dense(conds)
inputs = inputs + beta
return inputs
I get an error when I type a keras_tensor name = 'beta', shape = (input_shape [-1],), initializer = 'zeros'
TypeError: 'NoneType' object is not subscriptable Similarly, similar problems will be encountered in torch and jax. What is going on here, or can the official provide a multi-backend implementation?
Hi,
Could you please provide the simple standalone code to reproduce the issue in a colab Gist?
Hi,
Could you please provide the simple standalone code to reproduce the issue in a colab Gist?
This is a simple, reproducible error code ` #pip install bert4keras3==0.0.5 import os os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" os.environ["KERAS_BACKEND"] = "tensorflow"#also find error at jax, and not pytorch's implement os.environ["CUDA_VISIBLE_DEVICES"]="-1" os.environ['RECOMPUTE']="1"
from bert4keras3.models import build_transformer_model bert =build_transformer_model( attention_probs_dropout_prob= 0.0, directionality= "bidi", hidden_act= "gelu", hidden_size= 312, embedding_size= 128, initializer_range= 0.02, intermediate_size= 1248, max_position= 512, num_attention_heads= 12, num_hidden_layers= 4, type_vocab_size= 2, vocab_size= 21128, return_keras_model=True,) `
I think the code is still incomplete, please find the attached Gist here with the error.
th
我认为代码仍然不完整,请在此处找到带有错误的附件 Gist。
The error you are encountering is "TypeError: 'NoneType' object is not subscriptable." If you comment out os.environ['RECOMPUTE']="1", you will find that the model can be loaded successfully.
`
def recompute_grad(call):
#The code in question is decorating the call function of the Layers class.
if os.environ['RECOMPUTE']!="1":
return call
return tf.recompute_grad(call)
`