dingevin

Results 9 comments of dingevin

this is the utilization rate of the GPU,sometime it was zero, but i have recorded time of data pipeline and found it very fast, about 0.2 s each step when...

that's tensorflow timeline profilling [timeline_3.json](https://share.weiyun.com/5jGchyp)

I once suspected the version of tensorflow and try many version of tf-nightly-gpu, but have no effect; now my env is : **tf_nightly_gpu-1.13.0.dev20181210** , **CUDA-9.0** , **CUDNN: 7.3.0** . is...

it looks like spend much time before Forward and backward, like op 'RandomStandardNormal' , but i don't konw how to repair it. ![image](https://user-images.githubusercontent.com/16797858/56036650-f4b09400-5d5f-11e9-97c5-f37d0b3cf335.png)

Sync mode and async mode all have been trying: ``` bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=asr.librispeech.Librispeech960Grapheme --logdir=/data/dingzhenyou/speech_data/librispeech/log/ --logtostderr --enable_asserts=false ``` ``` bazel-bin/lingvo/trainer --run_locally=gpu --mode=async --model=asr.librispeech.Librispeech960Grapheme --logdir=/data/dingzhenyou/speech_data/librispeech/log/ --logtostderr --enable_asserts=false --job=controller,trainer ``` Am I...

i have updated tf-nightly, and the training speed still very slow. that's my env : [tf_env.txt](https://github.com/tensorflow/lingvo/files/3099748/tf_env.txt) and here is the training log, each step still spent 7~8s, [nohup.txt](https://github.com/tensorflow/lingvo/files/3100102/nohup.txt) that training...

Thanks, I will try more advanced machines.

@datavizweb i have training lingvo on Tesla V100 lately, but the training speeds not be faster. @jonathanasdf from my experiments, the GPUs Utilization was very high, but each step still...

I have test the disk IO, but didn't see any bottleneck, that' s the test info [librispeech.log](https://github.com/tensorflow/lingvo/files/3186944/librispeech.log), and i have placed the training data in shared server to test network...