Sanjoy Das
Sanjoy Das
@dgoldenberg-audiomack Does this problem reproduce in tf-nightly? NVIDIA added a GPU kernel for `ResourceSparseApplyAdagradV2` in January, so it is possible that this just works now.
For such cases, it'd be useful to get a stacktrace of where TF is stuck. You can obtain this using gdb: start gdb, [attach](https://sourceware.org/gdb/onlinedocs/gdb/Attach.html) to the hung TF process, and...
@alanzyt311 I missed that you're running TF 1.14. 1.14 is very old and does not have native support for your GPU (which I believe is Ampere based). So TensorFlow blocks...
Can you try using [`set_memory_growth`](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) to prevent TF from allocating all of the GPU memory on startup? Does that address this issue?
@tfeher Can you PTAL?
Thanks for the detailed triage @danfischetti ! @bixia1 can you PTAL?
@djoshea this looks like a reasonable request, will you be able to send a PR for this?
> I could potentially try developing it, but I unfortunately don't know where to start. Is there a guide somewhere to implementing new ops for the GPU? I don't think...
@quintinwang5, just so I understand are you saying that TF produces substantially different results on CPU vs GPU? If yes, can you demonstrate this using a Python snippet?