keras About Multi-Backend Implementation of Gradient Checkpointing question

I tried to implement Multi-Backend Gradient Checkpointing in https://github.com/pass-lin/bert4keras3 But I encounter some problems, such as when I implement in the tf backend

class ScaleOffset(Layer):
    def __init__(
        self,
        scale=True,
        offset=True,
        conditional=False,
        hidden_units=None,
        hidden_activation='linear',
        hidden_initializer='glorot_uniform',
        **kwargs
    ):
        super(ScaleOffset, self).__init__(**kwargs)
        self.scale = scale
        self.offset = offset
        self.conditional = conditional
        self.hidden_units = hidden_units
        self.hidden_activation = activations.get(hidden_activation)
        self.hidden_initializer = initializers.get(hidden_initializer)


    @integerize_shape
    def build(self, input_shape):
        super(ScaleOffset, self).build(input_shape)
        
        if self.conditional:
            input_shape = input_shape[0]


        if self.offset is True:
            self.beta = self.add_weight(
                name='beta', shape=(input_shape[-1],), initializer='zeros'
            )
        if self.scale is True:
            self.gamma = self.add_weight(
                name='gamma', shape=(input_shape[-1],), initializer='ones'
            )


        if self.conditional:


            if self.hidden_units is not None:
                self.hidden_dense = Dense(
                    units=self.hidden_units,
                    activation=self.hidden_activation,
                    use_bias=False,
                    kernel_initializer=self.hidden_initializer
                )


            if self.offset is not False and self.offset is not None:
                self.beta_dense = Dense(
                    units=input_shape[-1],
                    use_bias=False,
                    kernel_initializer='zeros'
                )
            if self.scale is not False and self.scale is not None:
                self.gamma_dense = Dense(
                    units=input_shape[-1],
                    use_bias=False,
                    kernel_initializer='zeros'
                )


    def compute_mask(self, inputs, mask=None):
        if self.conditional:
            return mask if mask is None else mask[0]
        else:
            return mask


    @tf.recompute_grad
    def call(self, inputs):
        """如果带有条件，则默认以list为输入，第二个是条件
        """
        if self.conditional:
            inputs, conds = inputs
            if self.hidden_units is not None:
                conds = self.hidden_dense(conds)
            conds = align(conds, [0, -1], ops.ndim(inputs))


        if self.scale is not False and self.scale is not None:
            gamma = self.gamma if self.scale is True else self.scale
            if self.conditional:
                gamma = gamma + self.gamma_dense(conds)
            inputs = inputs * gamma


        if self.offset is not False and self.offset is not None:
            beta = self.beta if self.offset is True else self.offset
            if self.conditional:
                beta = beta + self.beta_dense(conds)
            inputs = inputs + beta


        return inputs

I get an error when I type a keras_tensor name = 'beta', shape = (input_shape [-1],), initializer = 'zeros'

TypeError: 'NoneType' object is not subscriptable Similarly, similar problems will be encountered in torch and jax. What is going on here, or can the official provide a multi-backend implementation?

Dec 31 '23 10:12 pass-lin

Hi,

Could you please provide the simple standalone code to reproduce the issue in a colab Gist?

Jan 03 '24 19:01 sachinprasadhs

Hi,

Could you please provide the simple standalone code to reproduce the issue in a colab Gist?

This is a simple, reproducible error code ` #pip install bert4keras3==0.0.5 import os os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" os.environ["KERAS_BACKEND"] = "tensorflow"#also find error at jax, and not pytorch's implement os.environ["CUDA_VISIBLE_DEVICES"]="-1" os.environ['RECOMPUTE']="1"

from bert4keras3.models import build_transformer_model bert =build_transformer_model( attention_probs_dropout_prob= 0.0, directionality= "bidi", hidden_act= "gelu", hidden_size= 312, embedding_size= 128, initializer_range= 0.02, intermediate_size= 1248, max_position= 512, num_attention_heads= 12, num_hidden_layers= 4, type_vocab_size= 2, vocab_size= 21128, return_keras_model=True,) `

Jan 04 '24 10:01 pass-lin

I think the code is still incomplete, please find the attached Gist here with the error.

Jan 04 '24 23:01 sachinprasadhs

th

我认为代码仍然不完整，请在此处找到带有错误的附件 Gist。

The error you are encountering is "TypeError: 'NoneType' object is not subscriptable." If you comment out os.environ['RECOMPUTE']="1", you will find that the model can be loaded successfully.

`
def recompute_grad(call):

#The code in question is decorating the call function of the Layers class.

    if os.environ['RECOMPUTE']！="1":

        return call

    return tf.recompute_grad(call)

`

Jan 05 '24 10:01 pass-lin