benchmarks
benchmarks copied to clipboard
Keras+Tensorflow Benchmark on Synthetic LSTM Dataset
Hi,
I am running the lstm_benchmark.py test on CPU and multi GPU device(Amazon EC2) and I am not getting scaling as expected. Below are the pieces of information:
Instance: P3.8xLarge(Amazon AWS) contains 4 GPUs
Virtual Env: TensorFlow(+Keras2) with Python2 (CUDA 9.0, V9.0.176)( source activate tensorflow_p27)
Python version: 2.7.14
Tensorflow version: 1.5.0
Keras version: 2.1.4
Deep Learning AMI: Amazon Linux
Modifications:
run_tf_backend.sh: Changed models='resnet50_eager' to models=‘lstm’
models/lstm_benchmark.py: changed self.num_samples = 1000 to self.num_samples = 50000
Command ran:
$ sh run_tf_backend.sh cpu_config
$ sh run_tf_backend.sh gpu_config
$ sh run_tf_backend.sh multi_gpu_config
Results:
| Instance | GPUs | Backend | Batch size | Data Set | Training Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
|---|---|---|---|---|---|---|---|---|---|
| p3.8xLarge | 0 | Tensorflow | 128 | Synthetic | fit() | 18sec - 363us/step | unroll=False | 50000 | 0 |
| p3.8xLarge | 1 | Tensorflow | 128 | Synthetic | fit() | 18sec - 362us/step | unroll=False | 50000 | 15360 |
| p3.8xLarge | 4 | Tensorflow | 128 | Synthetic | fit() | 33sec - 651us/step | unroll=False | 50000 | 15410 |
The test doesn’t scale while using GPUs, means, speed/Epoch should be lower approx. by a factor of n where n is the number of GPUs.
Is this an expected behavior? Or Am I missing something here?
Thank-You!
/CC @anj-s