Rudolf A. Braun comments

Results 56 comments of


                                            Rudolf A. Braun

Trying to use 2 GPUs results in neverending process that can't be killed without restart

Ran `build/all_reduce_prof` and it worked for me as well as all the others. Should I still try step 3?

Trying to use 2 GPUs results in neverending process that can't be killed without restart

Not really getting much output... ``` seni@seni-MS-7A32:/work/fun/subword-repr$ cuda-memcheck --leak-check full --print-level info py3 km.py data/repr_nums_small ========= CUDA-MEMCHECK 2020-01-30 12:23:43.157 | INFO | __main__:kmeans:11 - Starting, data shape is (67673, 256)....

Sometimes batches are created which do not have same number of supervisions and inputs

Cheers for the link, that helps me understand the motivation! Okay give me a bit to think about it and I'll get back to you.

Sometimes batches are created which do not have same number of supervisions and inputs

So I did a hacky fix for this, after the supervision set is [created](https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L156) I have a loop that adjusts the supervision duration and deletes if necessary: ``` to_del =...

Sometimes batches are created which do not have same number of supervisions and inputs

I will try that out!

[Bug] RuntimeError: No backend type associated with device type cpu

The resolution is not clear to me. I'm getting the message "RuntimeError: No backend type associated with device type cpu". If I was logging 20 things some of them on...

warnings: resuming before epoch end is absolutely normal for long trainings

Does the checkpoint save the number of batches that were seen in the current epoch? thinking about how to resume from an inside epoch ckpt and think one could just...

convolution sample

same question! :) @ptillet

Impossible to use the tutorials

Randomly (not every time) getting ``` Argument rematerialization not implemented UNREACHABLE executed at /project/lib/Dialect/TritonGPU/Transforms/TritonGPUConversion.cpp:45! ``` when running a custom fused linear layer. (has activation, dropout and scaling) edit: this was...

Impossible to use the tutorials

Just to add I think people are getting this error from running pip install as that version crashes when doing ``` x = torch.randn(512).cuda() ln = FusedLayerNorm(512).cuda() y=ln(x) l=y.sum() l.backward()...