Orion Increase in Training Time for TadGAN implemented in TensorFlow 2.x

Increase in Training Time for TadGAN implemented in TensorFlow 2.x

Open lcwong0928 opened this issue 2 years ago • 3 comments

Orion version: 0.3.1.dev0
Python version: 3.7.11
Operating System: macOS

Description

There is a significant increase in training time per signal between TadGAN implemented in TensorFlow 2.x and TensorFlow 1.x. The main differences between the two environments are the methodology to compute the gradient penalty loss and tf.GradientTape() to compile the model.

What I Did

The first model (TF 2.0-1) is based on the Wasserstein GAN (WGAN) with Gradient Penalty (GP) tutorial that uses tf.GradientTape() for the train_step and a second-order tf.GradientTape() for the gradient penalty loss. The second model (TF 2.0-2) compiles the model similar to the TensorFlow 1.x version but still uses tf.GradientTape() for the gradient penalty loss.

The following table reports the average training time across all signals (on GPU).

May 17 '22 15:05 lcwong0928

Thank you @lcwong0928 for the analysis!

Would it be possible to make a CPU comparison between TF1.0 and TF2.0-2?

May 17 '22 16:05 sarahmish

Yes, will run a benchmark for the CPU version.

May 17 '22 21:05 lcwong0928

Quick comparison of memory consumption between TF1 and TF2=2.3.4 tadgan

#	TF1	TF2 w/ GT
initial	236444	271056
1	3746264	5616512
2	4137908	6289976
3	4284160	6709156
4	4465656	6897888
5	4487228	7031916
6	4506316	7189728
7	4562440	7333748
8	4731936	7428804
9	4731936	7446516
10	4736356	7554516

Sep 09 '22 15:09 sarahmish

PR #281 was merged.

Sep 24 '22 15:09 sarahmish

Orion Orion copied to clipboard

Increase in Training Time for TadGAN implemented in TensorFlow 2.x

Description

What I Did

Orion
Orion copied to clipboard