tf2rl
tf2rl copied to clipboard
Unset training-phase.
Hi. First, Thank you very much for sharing this project. It is absolutely amazing.
The custom training loop doesn't set Keras's backend training phase (training and testing). which makes the implementation of some Keras features and functions very inconvenient. It will be great if a solution can be integrated into the framework that prevents hacky fixes.
For example, I am using the Keras Spectral Normalization (tfa.layers.SpectralNormalization()
, since it is about 20% faster than the custom one). And I had to go back and modified the discriminator model to make it work, which in my opinion is not a perfect solution.
class Discriminator(tf.keras.Model):
...
if not enable_sn:
self.l1 = Dense(units[0], name="L1", activation="relu")
self.l2 = Dense(units[1], name="L2", activation="relu")
self.l3 = Dense(1, name="L3", activation=output_activation)
else:
print("[DEBUG] Spectral Normalization is enabled")
self.l1 = tfa.layers.SpectralNormalization(Dense(units[0], name="L1", activation="relu"))
self.l2 = tfa.layers.SpectralNormalization(Dense(units[1], name="L2", activation="relu"))
self.l3 = tfa.layers.SpectralNormalization(Dense(1, name="L3", activation=output_activation))
...
def call(self, inputs, training=None):
if training == None or training == True:
training = True
if not self._enable_sn:
features = self.l1(inputs)
features = self.l2(features)
return self.l3(features)
else:
features = self.l1(inputs, training=training)
features = self.l2(features, training=training)
return self.l3(features, training=training)
def compute_reward(self, inputs):
print("[DEBUG] initializing {compute_reward GAIL}")
if not self._enable_sn:
return tf.math.log(self(inputs) + 1e-8)
else:
return tf.math.log(self(inputs , training=False) + 1e-8)
Thanks,
Hi @HesNobi thanks for using this library!
Are you suggesting to replace the custom SN class with the tfa.layers.SpectralNormalization
? Or more drastic change, for example use Keras's compile
and fit
function?
If former, I'm willing to do that. The only reason I'm using the custom one is there was no official implementation when I implemented the class. If latter, I need to take time to think about that. The Keras's function is beautiful and well organized, but I'm not sure whether we can use that for RL, and how much it can cost readability.
Thanks.
HI. Thanks for getting back to me.
No, I don't mean the Spectral Normalization per se. Even though that can be a separate issue. Also, I personally prefer a custom training loop for RL. And I really like your implementation of it. What I meant was to add the mentioned feature for better Keras's functions operations, such as batch/layer normalization, dropout, or others.
I was thinking about tf.keras.backend.set_learning_phase
but sadly it has been deprecated. I am wondering if you can come up with a better solution.
Thanks