Mario Lezcano Casado comments

Results 237 comments of


                                            Mario Lezcano Casado

Remove DICT_KEYS guard

This fixed a graphbreak in asgd and nadam apparently. I have not looked into what happened there really, but I guess that's good.

Remove DICT_KEYS guard

Triggered a benchmark run to see if there's any perf hit in all the changes in this stack.

Remove DICT_KEYS guard

The optim changes seem to regress a bit. I'll look into how to get the graphbreak wins from this PR without regressing. I'm merging the rest of the stack, as...

Remove DICT_KEYS guard

This `DICT_KEYS` guard should be removed at some point as [BE]. I had a look at this PR though, and the graphbreaks that used to fix have already been fixed,...

Add decompositions for copy variants of view ops

@rec will take over this PR as part of his onboarding

Tracker: Slow gradcheck failures possibly indicating incorrect gradients

For the linalg operations, I bet that 99% of those come from testing the gradients on a matrix that's close to being singular rather than the formulas being wrong.

Preserve weight_g/weight_v accessors on new weight_norm

@MarktHart can you provide an example of what is it exactly that this breaks?

Preserve weight_g/weight_v accessors on new weight_norm

Could that be fixed by manually updating these models to use the new `weight_norm`?

Preserve weight_g/weight_v accessors on new weight_norm

So, let me spell things out to make sure we're talking about the same thing. The issue here is that 1. https://github.com/pytorch/pytorch/pull/103001 added a `torch.nn.utils.parametrizations.weight_norm` 2. Funnily enough, it didn't...

Preserve weight_g/weight_v accessors on new weight_norm

Already finding the potential source of the issue is plenty! But yeah, as discussed above, this looks like it may need to be looked at on the HF's end. Feel...