Chris Elrod
Chris Elrod
AutoGrad errored for me, but switching to the master branch fixed the problems. I have about the same results: ```julia julia> @btime agrad_loss∇($((w, b)), $x, $y) 466.866 μs (816 allocations:...
> Tullio's recursive threads-then-blocks algorithm? An additional consideration is that I haven't implemented anything like this in `LoopVectorization` yet, so Tullio's current implementation will get better performance beyond a certain...
> The number of LLVM instructions in the end didn't really seem to change. The llvm for `rhs!` seems to have changed a lot, but it isn't much shorter (435...
The difference is large when the chunk sizes are not a power of 2. Try using `Chunk(7)` instead of `Chunk(8)`, like in the example.
You can test by using this in the script: ```julia cfg = ForwardDiff.JacobianConfig(f!, du, u0, ForwardDiff.Chunk(5)); @time ForwardDiff.jacobian!(J, f!, du, u0, cfg); @btime ForwardDiff.jacobian!($J, $f!, $du, $u0, $cfg); ``` I...
Sorry, I apparently switched ForwardDiff commits in between my comments from 5 and 1 hour ago. Now that I've checked out this commit again, I see a roughly >2x performance...
I assume you've looked at these: https://github.com/mcabbott/Tullio.jl/tree/master/benchmarks 01 includes some broadcasting, and 02 includes matmuls and permutedims.
Okay, here is an updated script: ```c++ #include #include #include "Highs.h" using std::cout; using std::endl; int main() { // Create and populate a HighsModel instance for the LP // //...
> p.s. what is this for??? I need/want to remove redundant constraints while performing Fourier-Motzkin elimination. This can prevent the exponential explosion in number of constraints as we reduce the...
Any comments on the approach toward `passModel`? If the C-api of `passModel` fills out internal data structures, it'd be faster to just fill them directly, e.g. like in my first...