David Levinthal Ph.D.

Results 22 comments of David Levinthal Ph.D.

This error occurs on all variations of the ds2 applications. this makes it impossible to really know what model is being used and what the size is. printing out the...

left this out...this was run in mixed mode with loss scaling (ie uncommenting the 2 lines and commenting out the fp32 declaration for dtype)

do you want me to attach then to the bug as well? On Mon, Mar 18, 2019 at 10:34 AM Boris Ginsburg wrote: > David, can you attach a complete...

[conv_cuda101_75_mpiexec.log](https://github.com/NVIDIA/OpenSeq2Seq/files/2980372/conv_cuda101_75_mpiexec.log) [conv_cuda101_75_mpiexec_2.log](https://github.com/NVIDIA/OpenSeq2Seq/files/2980373/conv_cuda101_75_mpiexec_2.log)

same thing happens (ie hangs in exactly the same place) when run in FP32 mode on 4 V100s [conv_cuda101_75_mpiexec_fp322.log](https://github.com/NVIDIA/OpenSeq2Seq/files/2989025/conv_cuda101_75_mpiexec_fp322.log)

I am running with train only..I find train_eval produces too much output I will run on 1 gpu without horovod and checkpoint 100K d On Sat, Mar 23, 2019 at...

well.that was not a rousing success TypeError: Failed to convert object of type to Tensor. Contents: {'source_tensors': [, ], 'target_tensors': [, ]}. Consider casting elements to a supported type. On...

it fails in the same way with both mixed and fp32 modes on 1 gpu with horovod disabled (100K checkpointing) On Mon, Mar 25, 2019 at 9:41 AM Boris Ginsburg...

Same issue occurs on version patched to run on TF r.17 Note the following version works on 1 gpu but not on >1 gpu https://github.com/klintan/bi-att-flow/tree/dev Traceback (most recent call last):...

I will modify the Andreas Klintberg fork of this..as that is the code base that works on top of tree TF..nothing else does due to the change in the handling...