Jonas S. comments

Results 10 comments of


                                            Jonas S.

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Was this script run as well? https://github.com/FluxML/FluxMLBenchmarks.jl/blob/main/benchmark/benchmark/nnlib/conv.jl I‘m not sure, but it seems that only [flux] [mlp] was run (or is conv part of this group as well?).

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Just updated `/test/ext_loopvectorizationruntests.jl` to run some benchmarks (on CI probably with just one thread I guess), to make some of the test results more precise with actual numbers, these are...

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Has anyone an idea why the results on some CI devices are correct but sometimes totally wrong? Some weird LV behavior (the version should be the same on all devices)?...

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

@chriselrod Thank you for the quick reply! Just added a minimal example in the `runtests.jl` script: https://github.com/jonas208/NNlib.jl/blob/lv-ext2/test/ext_loopvectorization/minimal_test.jl To be really sure that the implementation is not just unequal to NNlib...

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Oh right, thank you, I must have apparently no longer thought about it. Thanks for linking the article!

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

After a lot of benchmarking and lots of smaller attempts to raise the performance a bit more, I mainly found out that: - for inference with a batch size of...

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

@ToucheSir Sure, got [these results](https://github.com/jonas208/NNlib.jl/blob/lv-ext2/benchmark_result_julia_BLAS.set_num_threads(1).csv).

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

Just had a quick look at the results on buildkite on the AMD GPU server. I saw massive (very unusual) differences between LV (75ms) and im2col (15+seconds) (never occurred otherwise,...

Possible way to implement a LoopVectorization extension for conv2d & meanpool2d & activations

@mcabbott (and maybe @chriselrod if you're interested) Sorry for the late reply! I just saw it again now. > Shouldn't it restrict T to the list of types which will...