Jonas S.

Results 10 comments of Jonas S.

Was this script run as well? https://github.com/FluxML/FluxMLBenchmarks.jl/blob/main/benchmark/benchmark/nnlib/conv.jl I‘m not sure, but it seems that only [flux] [mlp] was run (or is conv part of this group as well?).

Just updated `/test/ext_loopvectorizationruntests.jl` to run some benchmarks (on CI probably with just one thread I guess), to make some of the test results more precise with actual numbers, these are...

Has anyone an idea why the results on some CI devices are correct but sometimes totally wrong? Some weird LV behavior (the version should be the same on all devices)?...

@chriselrod Thank you for the quick reply! Just added a minimal example in the `runtests.jl` script: https://github.com/jonas208/NNlib.jl/blob/lv-ext2/test/ext_loopvectorization/minimal_test.jl To be really sure that the implementation is not just unequal to NNlib...

Some logs: Worked on ``` | Brand | Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz | | Vendor | :Intel | | Architecture | :Broadwell | ``` Worked on ```...

Oh right, thank you, I must have apparently no longer thought about it. Thanks for linking the article!

After a lot of benchmarking and lots of smaller attempts to raise the performance a bit more, I mainly found out that: - for inference with a batch size of...

@ToucheSir Sure, got [these results](https://github.com/jonas208/NNlib.jl/blob/lv-ext2/benchmark_result_julia_BLAS.set_num_threads(1).csv).

Just had a quick look at the results on buildkite on the AMD GPU server. I saw massive (very unusual) differences between LV (75ms) and im2col (15+seconds) (never occurred otherwise,...

@mcabbott (and maybe @chriselrod if you're interested) Sorry for the late reply! I just saw it again now. > Shouldn't it restrict T to the list of types which will...