Danial Javady
Danial Javady
@janeyx99 Please let me know if I need to do anything else.
@mikaylagawarecki @eqy I see that the pipeline is failing because of a linting issue. I used the lintrunner - I do not believe I touched this line on my own....
@eqy Anything that needs to be done on my end?
@eqy @mikaylagawarecki Hi folks, since I'm new to Pytorch I'm curious to know what the procedure is now. Will this be reviewed by a core maintainer/contributor by way of triage?...
> @pytorchbot revert -m "windows build failure is real, https://github.com/pytorch/pytorch/actions/runs/8910674030/job/24470387612#step:11:11236 is the correct failure line, ignore the statement saying build passed, batch is errorcodes arent propagating again" -c ignoredsignal @clee2000...
On my 3080 **BEFORE** line: `int n = npq_offset / (p_ * q_);` translates to [before_first_line_sass.txt](https://github.com/NVIDIA/cutlass/files/14826990/before_first_line_sass.txt) line: `int residual = npq_offset % (p_ * q_);` translates to [before_second_line_sass.txt](https://github.com/NVIDIA/cutlass/files/14826999/before_second_line_sass.txt) (i'll omit...
Like @manishucsd has stated I am consistently seeing that these changes are performing more poorly. edit: If I refactor `store_with_byte_offset` to also use `FastDivmod` the performance improves (GFLOPS) ``` original...
@bdice ``` Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------------------------------------------- ConditionalJoin/conditional_left_anti_join_32bit/100000/100000/manual_time 311 ms 312 ms 2 ConditionalJoin/conditional_left_anti_join_32bit/100000/400000/manual_time 1126 ms 1126 ms 1 ConditionalJoin/conditional_left_anti_join_32bit/100000/1000000/manual_time 2748 ms 2748 ms 1 ConditionalJoin/conditional_left_anti_join_64bit/100000/100000/manual_time 318 ms 318 ms...
Can I do this one?