Samantha Andow

Results 39 comments of Samantha Andow

The error: `One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.` means that the derivative with respect...

> When replacing jacrev with jacfwd, the following error occurs ... Ahh sorry that's my fault, I'll put up a patch to fix that today. In the meantime, if you...

To check @AlphaBetaGamma96's intuition that it might just be an OOM issue, I know you're able to compute the forward pass but are you able to compute just gradients on...

@JoaoLages Sorry for the delay in response, do you have an E2E repro that you could share? We're trying to understand if it's going to be better to recommend using...

cc @Chillee @anijain2305 Any thoughts? In particular re: why memory_efficient_fusion made the the final case slower

After digging into this a little bit, here's what's happening and how we can fix it: (1) In BOTH the cases where `xx_` is captured AND passed in, it's not...

On AWS V100s, I'm seeing 53ms on 0.1.1 50ms on 0.2.1 52ms on 1.13 ~4% regression from 0.2.1

> In other words, we cannot get per-batch gradient from the per-sample gradient by mean calculation, which was checked by me with empirical calculation in torch. Could you try summing...