Orion
Orion copied to clipboard
Increase in Training Time for TadGAN implemented in TensorFlow 2.x
- Orion version: 0.3.1.dev0
- Python version: 3.7.11
- Operating System: macOS
Description
There is a significant increase in training time per signal between TadGAN implemented in TensorFlow 2.x and TensorFlow 1.x. The main differences between the two environments are the methodology to compute the gradient penalty loss and tf.GradientTape()
to compile the model.
What I Did
The first model (TF 2.0-1) is based on the Wasserstein GAN (WGAN) with Gradient Penalty (GP) tutorial that uses tf.GradientTape()
for the train_step
and a second-order tf.GradientTape()
for the gradient penalty loss. The second model (TF 2.0-2) compiles the model similar to the TensorFlow 1.x version but still uses tf.GradientTape()
for the gradient penalty loss.
The following table reports the average training time across all signals (on GPU).
Thank you @lcwong0928 for the analysis!
Would it be possible to make a CPU comparison between TF1.0 and TF2.0-2?
Yes, will run a benchmark for the CPU version.
Quick comparison of memory consumption between TF1 and TF2=2.3.4 tadgan
# | TF1 | TF2 w/ GT |
---|---|---|
initial | 236444 | 271056 |
1 | 3746264 | 5616512 |
2 | 4137908 | 6289976 |
3 | 4284160 | 6709156 |
4 | 4465656 | 6897888 |
5 | 4487228 | 7031916 |
6 | 4506316 | 7189728 |
7 | 4562440 | 7333748 |
8 | 4731936 | 7428804 |
9 | 4731936 | 7446516 |
10 | 4736356 | 7554516 |
PR #281 was merged.