Chris Elrod
Chris Elrod
On the local versions (I plan to push by the end of the day): ```julia julia> using LoopVectorization, ForwardDiff, BenchmarkTools julia> using ForwardDiff: Dual, Partials julia> using LoopVectorization: SVec julia>...
> And, here's what happens with inv on my computer (an i7-8700), @code_native differs at length 8 but not length 4: Interesting. I see the same thing you do: ```julia...
It's insidious! It's one of the first things I look for when I run into surprising amounts of allocations. > And thanks, deleting that unused where Z fixes this example...
Okay. You're welcome to make PRs there yourself. It may be a couple months before I work on it myself.
In terms of data layout, I think it is likely that a "struct of arrays" representation would work best. Preferably, rather than creating three separate arrays, the last axis of...
> Interesting. We got away from this layout because I had the impression it would be better to have all the values localized, but it would be easy to get...
> This prompts me to realize the above should probably be 4 layers of nesting: we probably need something to say to the compiler that dimension 3 of `rgbchannels` has...
> Instead of specializing for RGB colors, how about a general interface for any `isbits` type convertible to `NTuple{N, Union{Float32, Float64}`? That covers a lot of use cases (complex numbers,...
I demoed rewriting the IR into loops performing a reverse pass last year. I think reviving that effort and adding support for some AD system would be another great use...
> I'm wondering would you consider generate the loop IR from inferred Julia SSA IR? this would give you type information to let you handle composite types. LoopVectorization already has...