Thomas Viehmann
Thomas Viehmann
> Note that, before passing the trace to the nvFuser executor, prims.copy_to_out_(t0, out=q) is put after the sdpaex operator thanks to functionalization. Ugh. Could it be that `inplace_copy_` is particular...
closing for now @crcrpar please reopen as you see fit.
The checker should also check that all proxies it finds are in the .names set.
Hi @rittik9 , thank you for working on this! So two quick comments and let me know how much you want to go into details or explore yourself: - this...
I like the proposal in general, a couple of details: - for the "do nothing" I wonder if empty lists or tuples would be better, - completely agree with manually...
@AugustDev Thank you, did you want to file this here or with https://github.com/Lightning-AI/pytorch-lightning/issues ?
Let's not. Optmizing without measurable impact is not a habit we want to get into.
TBH, this is a very clear "don't do this, chaning the fn is completely unsupported!". That said, we can talk about distributed-after-jit. The obstacles are: - Currently the ddp transformation...
I think the new fsdp/ddp actually do this.
Thank you for pinging @mtasic85 . We're looking into more fp8 support, but we likely want to deliver this through [Thunder](https://github.com/lightning-ai/lightning-thunder/), which will compile models to use optimizations. We do...