Albert Zeyer comments

Results 963 comments of


                                            Albert Zeyer

Linear memory growth, memory leak, maybe in convolution?

> What is the native stacktrace (via GDB)? You should see it hang somewhere inside the native TF lib. Some current observations on that: Ignoring any waiting threads (`__GI___poll`, `__pthread_cond_timedwait`,...

Linear memory growth, memory leak, maybe in convolution?

Ah very interesting. Initially, you said, you cannot reproduce it? So what change was relevant now? Having the larger batch size?

Linear memory growth, memory leak, maybe in convolution?

Can you report that in the TensorFlow GitHub issues and link it here?

Linear memory growth, memory leak, maybe in convolution?

If you take the same script but replace TF by PyTorch, how does that behave?

Linear memory growth, memory leak, maybe in convolution?

If you play around with some other things, e.g. `n_feat` or `n_hidden`, does the mem leak still occur?

Linear memory growth, memory leak, maybe in convolution?

The PR for padding the time dim (https://github.com/rwth-i6/returnn/pull/1468) is merged now. To continue the discussion here: > I compared padding the time dim for the first two layers on the...

RuntimeError: CUDA error: unspecified launch failure

One question is also why it hangs at exit. **Edit** Moved that as a separate issue to #1497.

RuntimeError: CUDA error: unspecified launch failure

> RuntimeError: CUDA error: unspecified launch failure Maybe related: https://github.com/pytorch/pytorch/issues/74235

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I'm closing for now, assuming a hardware issue. Reopen if there is any indication that there is some other problem, or sth we can do about it.

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I'm getting this quite frequently now. In most cases in a multi-GPU training setup on Nvidia 1080 GPUs (but probably that's just because that is currently my main setup, and...