Jonathan Shen

Results 85 comments of Jonathan Shen

Actually I think that might be the right speed. We get ~1.1s per step on 16 P100s which would be ~5s per step on 4 P100s and P100 is supposedly...

Hm, that's not good news... Unfortunately it's very hard to debug performance issues remotely. Please try out some tensorflow profiling options to see if the GPUs are being utilized efficiently...

It seems like there is a dip in the GPU graph every 6-8 seconds, which also corresponds to your step time. Could it be maybe the CPU/disk cannot keep up...

Sorry, I tried checking with a few people on our side and we didn't have any other hypothesis for why it's slower for you :(

I believe that depends on whether you are running in sync mode (the default) or async mode. In sync mode all GPUs need to complete their computation before the step...

I asked around, and got this > It improves WSJ / LibriSpeech for LAS as it tell the model explicitly, "end of utterance", in the early days. > > But...

This is known due to moving to python3 support in bazel. We'll wait a few weeks then change everything to python3 by default and that should fix things.

Sorry, we are aware that run_distributed has some problems, but don't have the resources to fix it at the moment. If someone is able to create a pull request that...

if you have 3 workers with 8 gpus each you should have worker_gpus=8 worker_replicas=3 and the rest default

It needs to be the same as your physical cluster setup. worker_replicas is the number of training worker jobs you are running. worker_gpus is the number of gpus each training...