Mario Lezcano Casado

Results 237 comments of Mario Lezcano Casado

This fixed a graphbreak in asgd and nadam apparently. I have not looked into what happened there really, but I guess that's good.

Triggered a benchmark run to see if there's any perf hit in all the changes in this stack.

The optim changes seem to regress a bit. I'll look into how to get the graphbreak wins from this PR without regressing. I'm merging the rest of the stack, as...

This `DICT_KEYS` guard should be removed at some point as [BE]. I had a look at this PR though, and the graphbreaks that used to fix have already been fixed,...

@rec will take over this PR as part of his onboarding

For the linalg operations, I bet that 99% of those come from testing the gradients on a matrix that's close to being singular rather than the formulas being wrong.

@MarktHart can you provide an example of what is it exactly that this breaks?

Could that be fixed by manually updating these models to use the new `weight_norm`?

So, let me spell things out to make sure we're talking about the same thing. The issue here is that 1. https://github.com/pytorch/pytorch/pull/103001 added a `torch.nn.utils.parametrizations.weight_norm` 2. Funnily enough, it didn't...

Already finding the potential source of the issue is plenty! But yeah, as discussed above, this looks like it may need to be looked at on the HF's end. Feel...