maia-chess icon indicating copy to clipboard operation
maia-chess copied to clipboard

Update tfprocess.py for TensorFlow 2.4+

Open CallOn84 opened this issue 2 years ago • 12 comments

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.

The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+.

To fix this, I changed two lines of code.

tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16').

self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).

CallOn84 avatar Jul 17 '23 22:07 CallOn84

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

reidmcy avatar Jul 18 '23 01:07 reidmcy

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

The tf.keras.mixed_precision code in this version of tfprocess.py wouldn't work with TensorFlow>2.4 because, as I understand it, keras.mixed_precision became more stable and improved from the original tf.keras.mixed_precision.experimental that Google decided to remove tf.keras.mixed_precision.experimental.

I would suggest creating some conditional statement that would allow tfprocess.py to identify what version of TensorFlow the user is using and run accordingly.

Something like this:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_visible_devices(gpus[self.cfg['gpu']], 'GPU')
tf.config.experimental.set_memory_growth(gpus[self.cfg['gpu']], True)
if self.model_dtype == tf.float16:
	if tf.__version__ >= '2.4':
		tf.keras.mixed_precision.set_global_policy('mixed_float16')
	else:
		tf.keras.mixed_precision.experimental.set_policy('mixed_float16')
self.active_lr = 0.01
self.optimizer = tf.keras.optimizers.SGD(learning_rate=lambda: self.active_lr, momentum=0.9, nesterov=True)
self.orig_optimizer = self.optimizer
if self.loss_scale != 1:
	if tf.__version__ >= '2.4':
		self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer)
	else:
		self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale)

There's probably a better way of coding this, but this is the only thing my small brain can come up with.

CallOn84 avatar Jul 18 '23 10:07 CallOn84

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

reidmcy avatar Jul 18 '23 19:07 reidmcy

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

I'm in the middle of training a Maia model that's targeting a rating of around 2500. Once I finish, I can send you the model for testing.

CallOn84 avatar Jul 18 '23 19:07 CallOn84

That's not a replication of our paper, this code is for the the KDD 2020 paper.

reidmcy avatar Jul 18 '23 19:07 reidmcy

That's not a replication of our paper, this code is for the the KDD 2020 paper.

Can you clarify what you mean by the code being for the KDD 2020 paper? What would paper are you referring to?

CallOn84 avatar Jul 18 '23 19:07 CallOn84

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

Ah, right. I got confused when you said it's not a replication of your paper and referred to something else.

Currently, the switch to tf.keras.mixed_precision.set_global_policy('mixed_float16') and self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer) allows me to run train_maia.py using TensorFlow 2.10. I can provide you the TensorBoard logs alongside the Maia net when the training is fully complete.

CallOn84 avatar Jul 18 '23 19:07 CallOn84

Are there any updates on this?

ezhang7423 avatar Dec 11 '23 20:12 ezhang7423

Are there any updates on this?

In order for me to see if this works, in terms of the end product, I'm currently training a Maia 2200 net. From there, testing it against Maia 1900 would allow me to see whether or not the changes to Keras have any positive or negative effects towards Maia 2200.

CallOn84 avatar Feb 05 '24 21:02 CallOn84

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

reidmcy avatar Feb 06 '24 22:02 reidmcy

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

Cool, will do.

CallOn84 avatar Feb 06 '24 22:02 CallOn84