Michael Abbott
Michael Abbott
Thanks for looking into all this. That's certainly not a graph which will impress anyone! I'd call it disappointing but unsurprising. Tullio generates the most naiive possible KernelAbstractions code, and...
Re the Float32 graph, in round numbers my CPU gets to about half a teraflop from the low-hundreds, which appears to be faster than Tullio's GPU matmul in the range...
Oh that's interesting. `thread_halves` looks at the bit-size of number types as part of a heuristic about how finely to divide up the work. And apparently I never considered that...
Thanks, this is a bug. I believe this ought to be an error. Tullio doesn't work the way you are imagining here. For each value of the indices on the...
Thanks for the issue. Sadly I think it gets this wrong at present, and may divide `i` up among threads. There will be other examples too, like `C[mod(i),k] += ...`...
Late to the party, but this seems a bit odd. If I understand right what's happening is this: ``` julia> using ChainRulesCore, ForwardDiff julia> p = ProjectTo(1.0) # Float64 in...
> Right now it has some promotion stuff in there. If `Float32(Dual(...))` means anything other than conversion to `Float32`, then surely it means some kind of implicit broadcast over the...
> Not in the sense that Complex are. They are things to let operator overloading AD happen. Yes. My example above was Vec for SIMD, which is similarly a way...
I think the difficulty with allowing this is that it will cause any other rule which has captured `x` to give wrong answers: ``` julia> Zygote.gradient([1,2,3]) do x y =...
It's harder for me to picture `rand!` going wrong in the wild, but it does have the same problem. Seems to be from #252, without discussion.