tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

Keras generates non-unique names for stateful LSTM's states

Open mergian opened this issue 2 years ago • 11 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS
  • TensorFlow installed from (source or binary): PYPI
  • TensorFlow version (use command below): 2.11.0
  • Python version: 3.7

Describe the problem. A stateful LSTM has 2 state values, both with the identical name. As you can see in the reproducing example, we have two different variables, but both have the identical name.

Describe the current behavior. LSTM states have the identical name.

Describe the expected behavior. LSTM states should have unique names.

Contributing.

  • Do you want to contribute a PR? (yes/no): yes
  • If yes, please read this page for instructions
  • Briefly describe your candidate solution(if contributing):

I think the problem comes from here: https://github.com/keras-team/keras/blob/master/keras/layers/rnn/base_rnn.py#L909

  1. I would do something like lambda x: backend.variable(x, name='state'). This will change the name to lstm/state:0 which is already more readable.
  2. To me it seems it's not possible to give tf.Variable a unique id, so I assume the naming would need to be done manually with something like lambda x: backend.varaible(x, name=f'state_{unique_id}'.

Standalone code to reproduce the issue.

import tensorflow as tf
import tensorflow.keras as keras
import numpy as np

inp = keras.Input(batch_shape=[1, 2, 3])
rnn = keras.layers.LSTM(stateful=True, units=3)
model = keras.Model(inp, rnn(inp))

print('Before:')
print(*rnn.states, sep='\n')
print()

data = np.random.rand(1, 2, 3).astype(np.float32)
model(data)

print('After:')
print(*rnn.states, sep='\n')

Output:

Before:
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[0., 0., 0.]], dtype=float32)>
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[0., 0., 0.]], dtype=float32)>

After:
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[ 0.11526701, -0.02300704, -0.1668661 ]], dtype=float32)>
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[ 0.17723909, -0.06333476, -0.31777197]], dtype=float32)>

mergian avatar Jan 25 '23 11:01 mergian

@mergian Sorry for the late response! I was able to replicate this issue on colab, please find the gist here. @SuryanarayanaY Could you please take a look at this issue? Thank you!

sushreebarsa avatar Feb 03 '23 01:02 sushreebarsa

@mergian ,

I have checked the Source code of LSTM and observed that the all LSTM layer attributes except states not defined as @property. Not sure though the exact reason for this and how it may affect this case. If you have time please have a look and see whether it provides any insight. I just checked implementing this makes any difference and found no difference. Escalating the issue for further investigation. Thanks!

SuryanarayanaY avatar Feb 14 '23 17:02 SuryanarayanaY

Thanks for the issue. Would you be able to contribute with a PR?

sampathweb avatar Feb 23 '23 20:02 sampathweb

@mergian , Thanks for the PR.

SuryanarayanaY avatar Mar 03 '23 05:03 SuryanarayanaY

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please read https://keras.io/keras_core/announcement/

SuryanarayanaY avatar Sep 22 '23 05:09 SuryanarayanaY

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 22 '23 05:09 google-ml-butler[bot]

@mergian,

Sorry for the inconvenience. Because of migration process we have moved the issue from keras-team/keras to keras-team/tf-keras and then keras-core to keras as keras which is now Keras3, going to be multi backend. As Team is quite busy in this process of migration activity your PR review might got delayed.

Since we can't transfer the PR from one repo to other we have to close all the PRs. I am reopening the issue and request you to raise the fresh PR here in this repo if its related to TF backend or if you believe it remains relevant to the Keras 3 code base you can raise a PR at keras-team/keras repo.

SuryanarayanaY avatar Sep 27 '23 12:09 SuryanarayanaY

Dear @SuryanarayanaY

thanks for the explanation. I fully understand the technical reasoning for closing the old PRs. However, please also understand that I neither have the time nor the motivation to redo all my pull requests (I think there are 4 open?) that have been waiting for more than 6 months in which Keras team didn't bother to review them.

Please feel free to run the reproducing scripts and check if the error still occurs yourself. You can also use the fixes I proposed in my pull requests if you like. However, I will not spend any more time on creating pull requests which Keras team asked for, which then get ignored for more than half a year and in the end being closed.

Sorry but my time is to valuable for this.

Best

mergian avatar Oct 02 '23 07:10 mergian

I have tested the code with keras_core and tf-nightly. This issue not relevant to keras_core as the variable names are not there in keras. Attached keras_core gist for reference.

However in tf-nightly this is still same behaviour as reported in the issue.Attached gist for tf-nightly .

SuryanarayanaY avatar Oct 06 '23 11:10 SuryanarayanaY

I tried to execute the code with the keras_core and observed that code was executed without fail/error. Kindly find the gist of it here. Thank you!

tilakrayal avatar May 16 '25 11:05 tilakrayal

Dear @tilakrayal

The issue is not that the code doesn't run, the code works. But the variables don't get unique names assigned. As shown in your Gist:

Using TensorFlow backend
Before:
<KerasVariable shape=(1, 3), dtype=float32, path=lstm/rnn_state>
<KerasVariable shape=(1, 3), dtype=float32, path=lstm/rnn_state>

After:
<KerasVariable shape=(1, 3), dtype=float32, path=lstm/rnn_state>
<KerasVariable shape=(1, 3), dtype=float32, path=lstm/rnn_state>

There are 2 Keras variables, both with the name lstm/rnn_state. In the old version it was lstm/Variable:0.

If you want an automated test, add this code after the previous. As the state names are not unique, the second will overwrite the first within the dict.

some_dict = {}
for state in rnn.states:
   some_dict[state.path] = state

assert len(some_dict) == len(rnn.states), f'expected {len(rnn.states)} but found {len(some_dict)}'

mergian avatar May 16 '25 11:05 mergian