Ross Wightman
Ross Wightman
@AmbiTyga that adds a significant amount of non-trivial code to the base model for a fairly specific feature, considering that there are now vit/deit, pit, tnt, swin, soon cait and...
I should also add that I do have plans to add feature extraction for the vit networks, like I have for the convnets, where activations of internal transformer blocks can...
@ngimel thanks for demonstrating this... it works, but like my prev impl, it is a hack. I actually just pulled my iscontiguous version out. It was causing too many problems,...
@ngimel a lot of people still use scripting for serving / export, not just performance. 'aot' vs 'torchcsript' on 1.12 is interesting, they are still quite different in some cases...
I will add though, per the original torch issue w/ LN + axis... regardless of performance, not having a native norm layer that covers this use case (C-dim), without needing...
@csarofeen keep in mind, I'm likely working with something a bit older than you, I was doing some testing on 1.12 release (cuda 11.3) via torchscript and aot-autograd. If you're...
@ngimel @csarofeen I ran a whole lot of BM runs on both 3090 and v100. As you can see, it's messy, I feel really messy without a clear cut win...
> We explicitly tested on 1.12 release, CC @ptrblck and @kevinstephano in case we were testing something slightly different. Definitely keep us posted, we're highly motivated to get our codegen...
There should probably be another location for 'nvfuser' + timm concerns, but will put this observation here for now A number of torchscript + nvfuser failures are due to handling...
@ngimel @csarofeen I spent a bit more time hacking around with this, as I keep getting frustrated by the lack of performance of non BN layers w/ PyTorch + GPU......