Horace He comments

Results 242 comments of


                                            Horace He

Using the repeat command (`.`) after inserting a tab does nothing

So interestingly, it's only if `` is the only type of action performed. `iabc.` works as expected.

Using the repeat command (`.`) after inserting a tab does nothing

@rnewman That's a separate bug, one that I can't replicate at least. Could you open up a new issue, and post any errors you might have in your dev console?...

exclude embedding in MFU computation

@ad8e imo, I would either use 6 or 12. MFU was originally intended to *exclude* recomputation flops (from activation checkpointing), it seems somewhat strange to me to reinclude it here....

[inductor] Added decomposition for _upsample_bilinear2d_aa

If I'm understanding the benchmarks correctly, it seems like we shouldn't enable it right now? Since even for fp32 on cuda, perf is quite a bit worse than the eager...

Consider removing `memory_efficient_fusion` from xformers

I think I'll probably add the cache back, since it seems like there's a couple more folks relying on the old behavior than I thought, and many people don't want...

Consider removing `memory_efficient_fusion` from xformers

@dianaml0 I'll do it today.

Consider removing `memory_efficient_fusion` from xformers

Actually, we plan on moving Dynamo into PyTorch core sooner than expected (this wek). When that's done, would it also be possible to replace the call to `memory_efficient_fusion` with `dynamo.optimize`?...

[inductor] Make #91254 work for dynamic shapes

@ngimel My PR doesn't fix the issue, just the underlying `guard_multiple_of` issue.

[RFC] Make more operations inplace (GELU, BatchNorm, LayerNorm)

@vadimkantorov I think torch.compile does something better than producing in-place code in your cases.

Inplace fused (leaky-)relu+dropout for memory savings

@vadimkantorov For dropout + relu, we do this (run this with `AOT_FX_GRAPHS=1`) ``` @torch.compile def f(x): return F.dropout(x, 0.5, training=True).relu() f(torch.randn(20, 20, requires_grad=True, device='cuda')) ``` ``` ====== Forward graph 0...