dingevin
dingevin
this is the utilization rate of the GPU,sometime it was zero, but i have recorded time of data pipeline and found it very fast, about 0.2 s each step when...
that's tensorflow timeline profilling [timeline_3.json](https://share.weiyun.com/5jGchyp)
I once suspected the version of tensorflow and try many version of tf-nightly-gpu, but have no effect; now my env is : **tf_nightly_gpu-1.13.0.dev20181210** , **CUDA-9.0** , **CUDNN: 7.3.0** . is...
it looks like spend much time before Forward and backward, like op 'RandomStandardNormal' , but i don't konw how to repair it. 
Sync mode and async mode all have been trying: ``` bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=asr.librispeech.Librispeech960Grapheme --logdir=/data/dingzhenyou/speech_data/librispeech/log/ --logtostderr --enable_asserts=false ``` ``` bazel-bin/lingvo/trainer --run_locally=gpu --mode=async --model=asr.librispeech.Librispeech960Grapheme --logdir=/data/dingzhenyou/speech_data/librispeech/log/ --logtostderr --enable_asserts=false --job=controller,trainer ``` Am I...
i have updated tf-nightly, and the training speed still very slow. that's my env : [tf_env.txt](https://github.com/tensorflow/lingvo/files/3099748/tf_env.txt) and here is the training log, each step still spent 7~8s, [nohup.txt](https://github.com/tensorflow/lingvo/files/3100102/nohup.txt) that training...
Thanks, I will try more advanced machines.
@datavizweb i have training lingvo on Tesla V100 lately, but the training speeds not be faster. @jonathanasdf from my experiments, the GPUs Utilization was very high, but each step still...
I have test the disk IO, but didn't see any bottleneck, that' s the test info [librispeech.log](https://github.com/tensorflow/lingvo/files/3186944/librispeech.log), and i have placed the training data in shared server to test network...