eqy
eqy
cuDNN has managed to upload cu11 and cu12 wheels for ~~9.0.0.312~~ 9.1.0.70, so trying this out... CC @Skylion007 @malfet cc @csarofeen @ptrblck @xwang233
The flag basically does nothing following #95722 Let's see if the quantization tests break CC @malfet @atalmanagement cc @csarofeen @ptrblck @xwang233 @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel
Doesn't affect current behavior by default, for #126544 I'm not sure what the exact mechanism is here but CUDA errors appear to already be thrown in the main process, meaning...
Somehow the original PR was missing the `CUDA_KERNEL_LOOP_TYPE` change??? Thanks @johnc-keen @Chillee for the great repro! (#129785) cc @ptrblck @msaroufim @mikaylagawarecki
Newer versions of cuDNN can dispatch to a winograd kernel here on A100 which affects numerics a bit cc @csarofeen @ptrblck @xwang233 @zou3519 @Chillee @samdow @kshitij12345 @janeyx99
Seems to be removed following #99699?
Same `char` dtype causing device index `0` to be interpreted as a null-terminator, see also #123984 cc @ptrblck @msaroufim
Fix for PyTorch build CC @ptrblck @nWEIdia
In the spirit of warming up for JIT compilation, add a warmup iteration in case the very last batch has a different size that may unwittingly trigger recompilation
Calling `getenv` on side threads is dangerous as it can potentially segfault if the main thread is in the middle of setting environment variables: https://github.com/pytorch/pytorch/issues/134596 This PR only calls `getenv`...