NNlib.jl
NNlib.jl copied to clipboard
Reduce BLAS threads while parallelizing over GEMM
Can someone add the benchmark label?
Judge result
Benchmark Report for /home/runner/work/FluxMLBenchmarks.jl/FluxMLBenchmarks.jl/benchmark/script/..
Job Properties
- Time of benchmarks:
- Target: 30 Apr 2024 - 04:30
- Baseline: 30 Apr 2024 - 04:30
- Package commits:
- Target: non gi
- Baseline: non gi
- Julia commits:
- Target: bd47ec
- Baseline: bd47ec
- Julia command flags:
- Target: None
- Baseline: None
- Environment variables:
- Target:
FLUXML_BENCHMARK_FLUX_MLP => true
FLUXML_BENCHMARK_FLUX => true
JULIA_NUM_THREADS => 1
- Baseline:
FLUXML_BENCHMARK_FLUX_MLP => true
FLUXML_BENCHMARK_FLUX => true
JULIA_NUM_THREADS => 1
- Target:
Results
A ratio greater than 1.0
denotes a possible regression (marked with ❌), while a ratio less
than 1.0
denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).
ID | time ratio | memory ratio |
---|---|---|
["flux", "mlp", "Float32"] |
1.33 (5%) ❌ | 1.00 (1%) |
["nnlib", "attention", "Float16", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
1.07 (5%) ❌ | 1.00 (1%) |
["nnlib", "attention", "Float16", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
23372.86 (5%) ❌ | 542.62 (1%) ❌ |
["nnlib", "attention", "Float64", "attention", "q((8, 6, 1))-k((8, 10, 1))-v((4, 10, 1))-bias(nothing)-nheads(1)"] |
1.08 (5%) ❌ | 1.00 (1%) |
["nnlib", "attention", "Float64", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
1.21 (5%) ❌ | 1.00 (1%) |
["nnlib", "attention", "Float64", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
1.47 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
0.83 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
0.83 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
0.87 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
0.87 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
0.77 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
1.06 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
0.75 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
1.08 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
0.80 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
0.77 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
0.78 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
0.90 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
0.90 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
0.82 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
0.78 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
0.72 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
0.75 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
0.86 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
0.68 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
0.90 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
1.23 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
1.10 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
1.07 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
1.22 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
1.08 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
1.08 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
1.10 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
1.09 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
0.82 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] |
0.86 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
0.88 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] |
0.88 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] |
0.83 (5%) ✅ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
1.15 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] |
1.06 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] |
1.10 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
1.21 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
235.78 (5%) ❌ | 3700.24 (1%) ❌ |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] |
1.05 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] |
1.06 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
1.10 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] |
1.09 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
1.33 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
1.05 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
1.28 (5%) ❌ | 1.00 (1%) |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
1.11 (5%) ❌ | 1.00 (1%) |
["nnlib", "dropout", "3-N(128)", "dropout", "with-dim"] |
0.64 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(256)", "dropout!", "with-dim"] |
0.39 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(256)", "dropout", "with-colon"] |
0.56 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(256)", "dropout", "with-dim"] |
0.49 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(512)", "dropout!", "with-dim"] |
0.78 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(512)", "dropout", "with-colon"] |
0.81 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "3-N(512)", "dropout", "with-dim"] |
0.64 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "4-N(128)", "dropout", "with-colon"] |
1.20 (5%) ❌ | 1.00 (1%) |
["nnlib", "dropout", "4-N(128)", "dropout", "with-dim"] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "4-N(256)", "dropout", "with-dim"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "dropout", "4-N(512)", "dropout!", "with-dim"] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
0.81 (5%) ✅ | 1.00 (1%) |
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
0.88 (5%) ✅ | 1.00 (1%) |
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(80)-N(40)-K(100)-alpha(1.0)-beta(0.0)"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct", "data"] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct", "pool"] |
0.69 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "data"] |
1.09 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "pool"] |
0.67 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "data"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "pool"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct", "pool"] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] |
1.09 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "data"] |
0.95 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "pool"] |
1.06 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct", "data"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct", "data"] |
0.95 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct", "data"] |
1.22 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct", "data"] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct", "pool"] |
1.31 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "data"] |
1.07 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "pool"] |
1.22 (5%) ❌ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct", "data"] |
0.91 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct", "pool"] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct", "data"] |
0.95 (5%) ✅ | 1.00 (1%) |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct", "data"] |
0.95 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (1024, 2048, 4)] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (12288, 2048, 1)] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (128, 384, 8)] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (2048, 2048, 2)] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (4096, 2048, 2)] |
0.94 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (512, 784, 8)] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (768, 1024, 4)] |
0.92 (5%) ✅ | 1.00 (1%) |
["nnlib", "softmax", "softmax", "Float32", "bw", (4096, 4096, 2)] |
1.09 (5%) ❌ | 1.00 (1%) |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "bw"] |
1.15 (5%) ❌ | 1.00 (1%) |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "fw"] |
1.18 (5%) ❌ | 1.00 (1%) |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16", "fw"] |
1.03 (5%) | 1.05 (1%) ❌ |
["nnlib", "upsample", "nearest", "3-N(128)", "Float64"] |
0.93 (5%) ✅ | 1.00 (1%) |
["nnlib", "upsample", "nearest", "3-N(64)", "Float16"] |
0.90 (5%) ✅ | 1.00 (1%) |
["nnlib", "upsample", "nearest", "3-N(64)", "Float32"] |
0.89 (5%) ✅ | 1.00 (1%) |
["nnlib", "upsample", "nearest", "3-N(64)", "Float64"] |
0.94 (5%) ✅ | 1.00 (1%) |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
-
["flux", "mlp"]
-
["nnlib", "activations", "Float16"]
-
["nnlib", "activations", "Float32"]
-
["nnlib", "activations", "Float64"]
-
["nnlib", "attention", "Float16", "attention"]
-
["nnlib", "attention", "Float16", "score"]
-
["nnlib", "attention", "Float64", "attention"]
-
["nnlib", "attention", "Float64", "score"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "dropout", "3-N(128)", "dropout!"]
-
["nnlib", "dropout", "3-N(128)", "dropout"]
-
["nnlib", "dropout", "3-N(256)", "dropout!"]
-
["nnlib", "dropout", "3-N(256)", "dropout"]
-
["nnlib", "dropout", "3-N(512)", "dropout!"]
-
["nnlib", "dropout", "3-N(512)", "dropout"]
-
["nnlib", "dropout", "4-N(128)", "dropout!"]
-
["nnlib", "dropout", "4-N(128)", "dropout"]
-
["nnlib", "dropout", "4-N(256)", "dropout!"]
-
["nnlib", "dropout", "4-N(256)", "dropout"]
-
["nnlib", "dropout", "4-N(512)", "dropout!"]
-
["nnlib", "dropout", "4-N(512)", "dropout"]
-
["nnlib", "dropout", "5-N(128)", "dropout!"]
-
["nnlib", "dropout", "5-N(128)", "dropout"]
-
["nnlib", "dropout", "5-N(256)", "dropout!"]
-
["nnlib", "dropout", "5-N(256)", "dropout"]
-
["nnlib", "dropout", "5-N(512)", "dropout!"]
-
["nnlib", "dropout", "5-N(512)", "dropout"]
-
["nnlib", "gemm", "Float32", "batched_gemm!"]
-
["nnlib", "gemm", "Float64", "batched_gemm!"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "fw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "fw"]
-
["nnlib", "softmax", "softmax", "Float16", "bw"]
-
["nnlib", "softmax", "softmax", "Float16", "fw"]
-
["nnlib", "softmax", "softmax", "Float32", "bw"]
-
["nnlib", "softmax", "softmax", "Float32", "fw"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32"]
-
["nnlib", "upsample", "nearest", "3-N(128)"]
-
["nnlib", "upsample", "nearest", "3-N(64)"]
-
["nnlib", "upsample", "nearest", "4-N(128)"]
-
["nnlib", "upsample", "nearest", "4-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(64)"]
Julia versioninfo
Target
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 22.04.4 LTS
uname: Linux 6.5.0-1018-azure #19~22.04.2-Ubuntu SMP Thu Mar 21 16:45:46 UTC 2024 x86_64 x86_64
CPU: AMD EPYC 7763 64-Core Processor:
speed user nice sys idle irq
#1 2585 MHz 276 s 0 s 88 s 2928 s 0 s
#2 3086 MHz 342 s 0 s 113 s 2844 s 0 s
#3 3242 MHz 555 s 0 s 86 s 2638 s 0 s
#4 3217 MHz 262 s 0 s 77 s 2951 s 0 s
Memory: 15.606494903564453 GB (13991.60546875 MB free)
Uptime: 331.65 sec
Load Avg: 1.0 0.44 0.18
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Baseline
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 22.04.4 LTS
uname: Linux 6.5.0-1018-azure #19~22.04.2-Ubuntu SMP Thu Mar 21 16:45:46 UTC 2024 x86_64 x86_64
CPU: AMD EPYC 7763 64-Core Processor:
speed user nice sys idle irq
#1 3242 MHz 145 s 0 s 70 s 2627 s 0 s
#2 3245 MHz 313 s 0 s 89 s 2446 s 0 s
#3 3215 MHz 340 s 0 s 65 s 2423 s 0 s
#4 3114 MHz 211 s 0 s 58 s 2570 s 0 s
Memory: 15.606494903564453 GB (14153.55859375 MB free)
Uptime: 286.52 sec
Load Avg: 0.83 0.33 0.13
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Target result
Benchmark Report for /home/runner/work/FluxMLBenchmarks.jl/FluxMLBenchmarks.jl/benchmark/script/..
Job Properties
- Time of benchmark: 30 Apr 2024 - 4:30
- Package commit: non gi
- Julia commit: bd47ec
- Julia command flags: None
- Environment variables:
FLUXML_BENCHMARK_FLUX_MLP => true
FLUXML_BENCHMARK_FLUX => true
JULIA_NUM_THREADS => 1
Results
Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID
column have the structure [parent_group, child_group, ..., key]
, and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.
ID | time | GC time | memory | allocations |
---|---|---|---|---|
["flux", "mlp", "Float16"] |
349.251 μs (5%) | 23.30 KiB (1%) | 8 | |
["flux", "mlp", "Float32"] |
442.806 μs (5%) | 3.25 KiB (1%) | 6 | |
["flux", "mlp", "Float64"] |
349.602 μs (5%) | 23.30 KiB (1%) | 8 | |
["nnlib", "activations", "Float16", "celu"] |
27.381 ms (5%) | |||
["nnlib", "activations", "Float16", "elu"] |
27.376 ms (5%) | |||
["nnlib", "activations", "Float16", "gelu"] |
78.918 ms (5%) | |||
["nnlib", "activations", "Float16", "hardswish"] |
611.295 μs (5%) | |||
["nnlib", "activations", "Float16", "hardtanh"] |
153.768 μs (5%) | |||
["nnlib", "activations", "Float16", "hardσ"] |
395.610 μs (5%) | |||
["nnlib", "activations", "Float16", "leakyrelu"] |
155.781 μs (5%) | |||
["nnlib", "activations", "Float16", "lisht"] |
9.204 ms (5%) | |||
["nnlib", "activations", "Float16", "logcosh"] |
36.531 ms (5%) | |||
["nnlib", "activations", "Float16", "logσ"] |
31.914 ms (5%) | |||
["nnlib", "activations", "Float16", "mish"] |
98.831 ms (5%) | |||
["nnlib", "activations", "Float16", "relu"] |
150.482 μs (5%) | |||
["nnlib", "activations", "Float16", "relu6"] |
159.128 μs (5%) | |||
["nnlib", "activations", "Float16", "rrelu"] |
4.182 ms (5%) | |||
["nnlib", "activations", "Float16", "selu"] |
37.319 ms (5%) | |||
["nnlib", "activations", "Float16", "sigmoid_fast"] |
14.182 ms (5%) | |||
["nnlib", "activations", "Float16", "softplus"] |
27.365 ms (5%) | |||
["nnlib", "activations", "Float16", "softshrink"] |
299.481 μs (5%) | |||
["nnlib", "activations", "Float16", "softsign"] |
1.559 ms (5%) | |||
["nnlib", "activations", "Float16", "swish"] |
15.884 ms (5%) | |||
["nnlib", "activations", "Float16", "tanh_fast"] |
8.548 ms (5%) | |||
["nnlib", "activations", "Float16", "tanhshrink"] |
9.123 ms (5%) | |||
["nnlib", "activations", "Float16", "trelu"] |
149.430 μs (5%) | |||
["nnlib", "activations", "Float16", "σ"] |
14.159 ms (5%) | |||
["nnlib", "activations", "Float32", "celu"] |
6.097 ms (5%) | |||
["nnlib", "activations", "Float32", "elu"] |
5.227 ms (5%) | |||
["nnlib", "activations", "Float32", "gelu"] |
9.403 ms (5%) | |||
["nnlib", "activations", "Float32", "hardswish"] |
289.312 μs (5%) | |||
["nnlib", "activations", "Float32", "hardtanh"] |
294.453 μs (5%) | |||
["nnlib", "activations", "Float32", "hardσ"] |
273.963 μs (5%) | |||
["nnlib", "activations", "Float32", "leakyrelu"] |
293.038 μs (5%) | |||
["nnlib", "activations", "Float32", "lisht"] |
393.888 μs (5%) | |||
["nnlib", "activations", "Float32", "logcosh"] |
17.226 ms (5%) | |||
["nnlib", "activations", "Float32", "logσ"] |
16.732 ms (5%) | |||
["nnlib", "activations", "Float32", "mish"] |
35.471 ms (5%) | |||
["nnlib", "activations", "Float32", "relu"] |
321.062 μs (5%) | |||
["nnlib", "activations", "Float32", "relu6"] |
286.454 μs (5%) | |||
["nnlib", "activations", "Float32", "rrelu"] |
1.336 ms (5%) | |||
["nnlib", "activations", "Float32", "selu"] |
5.639 ms (5%) | |||
["nnlib", "activations", "Float32", "sigmoid_fast"] |
6.553 ms (5%) | |||
["nnlib", "activations", "Float32", "softplus"] |
16.491 ms (5%) | |||
["nnlib", "activations", "Float32", "softshrink"] |
285.985 μs (5%) | |||
["nnlib", "activations", "Float32", "softsign"] |
296.495 μs (5%) | |||
["nnlib", "activations", "Float32", "swish"] |
6.890 ms (5%) | |||
["nnlib", "activations", "Float32", "tanh_fast"] |
374.672 μs (5%) | |||
["nnlib", "activations", "Float32", "tanhshrink"] |
378.109 μs (5%) | |||
["nnlib", "activations", "Float32", "trelu"] |
303.719 μs (5%) | |||
["nnlib", "activations", "Float32", "σ"] |
7.042 ms (5%) | |||
["nnlib", "activations", "Float64", "celu"] |
4.893 ms (5%) | |||
["nnlib", "activations", "Float64", "elu"] |
5.388 ms (5%) | |||
["nnlib", "activations", "Float64", "gelu"] |
9.569 ms (5%) | |||
["nnlib", "activations", "Float64", "hardswish"] |
549.468 μs (5%) | |||
["nnlib", "activations", "Float64", "hardtanh"] |
546.051 μs (5%) | |||
["nnlib", "activations", "Float64", "hardσ"] |
540.721 μs (5%) | |||
["nnlib", "activations", "Float64", "leakyrelu"] |
570.828 μs (5%) | |||
["nnlib", "activations", "Float64", "lisht"] |
9.228 ms (5%) | |||
["nnlib", "activations", "Float64", "logcosh"] |
17.815 ms (5%) | |||
["nnlib", "activations", "Float64", "logσ"] |
16.913 ms (5%) | |||
["nnlib", "activations", "Float64", "mish"] |
35.081 ms (5%) | |||
["nnlib", "activations", "Float64", "relu"] |
567.762 μs (5%) | |||
["nnlib", "activations", "Float64", "relu6"] |
553.035 μs (5%) | |||
["nnlib", "activations", "Float64", "rrelu"] |
1.343 ms (5%) | |||
["nnlib", "activations", "Float64", "selu"] |
5.965 ms (5%) | |||
["nnlib", "activations", "Float64", "sigmoid_fast"] |
6.744 ms (5%) | |||
["nnlib", "activations", "Float64", "softplus"] |
16.722 ms (5%) | |||
["nnlib", "activations", "Float64", "softshrink"] |
558.184 μs (5%) | |||
["nnlib", "activations", "Float64", "softsign"] |
551.061 μs (5%) | |||
["nnlib", "activations", "Float64", "swish"] |
7.163 ms (5%) | |||
["nnlib", "activations", "Float64", "tanh_fast"] |
9.127 ms (5%) | |||
["nnlib", "activations", "Float64", "tanhshrink"] |
9.113 ms (5%) | |||
["nnlib", "activations", "Float64", "trelu"] |
558.386 μs (5%) | |||
["nnlib", "activations", "Float64", "σ"] |
5.970 ms (5%) | |||
["nnlib", "attention", "Float16", "attention", "q((16, 128, 8))-k((16, 512, 8))-v((32, 512, 8))-bias((512, 128))-nheads(4)"] |
151.839 ms (5%) | 14.59 MiB (1%) | 881 | |
["nnlib", "attention", "Float16", "attention", "q((64, 64, 16))-k((64, 64, 16))-v((64, 64, 16))-bias((64, 64))-nheads(4)"] |
40.701 ms (5%) | 4.97 MiB (1%) | 1585 | |
["nnlib", "attention", "Float16", "attention", "q((8, 6, 1))-k((8, 10, 1))-v((4, 10, 1))-bias(nothing)-nheads(1)"] |
39.113 μs (5%) | 46.12 KiB (1%) | 62 | |
["nnlib", "attention", "Float16", "score", "q(8, (16, 128, 8))-k(8, (16, 512, 8))-bias((512, 128))-nheads(4)"] |
406.917 ms (5%) | 370.895 μs | 69.56 MiB (1%) | 1693 |
["nnlib", "attention", "Float16", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
170.142 ms (5%) | 4.271 ms | 56.78 MiB (1%) | 12317 |
["nnlib", "attention", "Float16", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
1.428 s (5%) | 21.796 ms | 94.94 MiB (1%) | 1539383 |
["nnlib", "attention", "Float64", "attention", "q((16, 128, 8))-k((16, 512, 8))-v((32, 512, 8))-bias((512, 128))-nheads(4)"] |
24.345 ms (5%) | 50.28 MiB (1%) | 50 | |
["nnlib", "attention", "Float64", "attention", "q((64, 64, 16))-k((64, 64, 16))-v((64, 64, 16))-bias((64, 64))-nheads(4)"] |
2.994 ms (5%) | 9.03 MiB (1%) | 50 | |
["nnlib", "attention", "Float64", "attention", "q((8, 6, 1))-k((8, 10, 1))-v((4, 10, 1))-bias(nothing)-nheads(1)"] |
8.382 μs (5%) | 5.30 KiB (1%) | 38 | |
["nnlib", "attention", "Float64", "score", "q(8, (16, 128, 8))-k(8, (16, 512, 8))-bias((512, 128))-nheads(4)"] |
137.221 ms (5%) | 1.280 ms | 262.13 MiB (1%) | 29 |
["nnlib", "attention", "Float64", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
58.529 ms (5%) | 7.320 ms | 140.50 MiB (1%) | 29 |
["nnlib", "attention", "Float64", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
45.736 μs (5%) | 20.16 KiB (1%) | 17 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
43.051 μs (5%) | 2.75 KiB (1%) | 47 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
43.893 μs (5%) | 3.03 KiB (1%) | 51 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
43.812 μs (5%) | 6.23 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
38.713 μs (5%) | 2.77 KiB (1%) | 47 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
43.542 μs (5%) | 3.05 KiB (1%) | 51 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
44.724 μs (5%) | 9.28 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
76.313 μs (5%) | 6.67 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
78.918 μs (5%) | 6.67 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
40.627 μs (5%) | 5.52 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
82.404 μs (5%) | 9.69 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
83.015 μs (5%) | 9.69 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
44.513 μs (5%) | 8.52 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
36.608 μs (5%) | 2.34 KiB (1%) | 41 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
43.341 μs (5%) | 3.44 KiB (1%) | 57 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
54.622 μs (5%) | 6.55 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
41.057 μs (5%) | 2.34 KiB (1%) | 41 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
49.804 μs (5%) | 3.45 KiB (1%) | 57 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
59.411 μs (5%) | 9.59 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
65.593 μs (5%) | 6.80 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
66.304 μs (5%) | 6.62 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
40.486 μs (5%) | 5.47 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
63.539 μs (5%) | 9.80 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
74.970 μs (5%) | 9.62 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
42.519 μs (5%) | 8.47 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
48.901 μs (5%) | 2.78 KiB (1%) | 49 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
53.531 μs (5%) | 3.06 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
57.687 μs (5%) | 9.30 KiB (1%) | 55 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
46.797 μs (5%) | 2.80 KiB (1%) | 49 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
53.059 μs (5%) | 3.08 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
60.182 μs (5%) | 15.31 KiB (1%) | 55 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
99.126 μs (5%) | 9.70 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
99.847 μs (5%) | 9.70 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
54.652 μs (5%) | 8.55 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
86.502 μs (5%) | 15.72 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
104.786 μs (5%) | 15.72 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
57.368 μs (5%) | 14.55 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
54.211 μs (5%) | 2.38 KiB (1%) | 43 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
53.139 μs (5%) | 3.47 KiB (1%) | 59 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
66.074 μs (5%) | 9.61 KiB (1%) | 60 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
53.570 μs (5%) | 2.38 KiB (1%) | 43 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
54.302 μs (5%) | 3.48 KiB (1%) | 59 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
73.617 μs (5%) | 15.62 KiB (1%) | 60 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
76.694 μs (5%) | 9.83 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
83.106 μs (5%) | 9.66 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
54.312 μs (5%) | 8.50 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
86.532 μs (5%) | 15.83 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
95.278 μs (5%) | 15.66 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
61.835 μs (5%) | 14.50 KiB (1%) | 44 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] |
671.245 μs (5%) | 752 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] |
765.742 μs (5%) | 1.05 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] |
1.049 ms (5%) | 773.17 KiB (1%) | 21 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] |
676.345 μs (5%) | 768 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] |
726.930 μs (5%) | 1.12 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] |
1.090 ms (5%) | 1.51 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] |
840.342 μs (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] |
3.250 ms (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] |
999.188 μs (5%) | 2.29 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
1.004 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] |
3.839 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] |
1.136 ms (5%) | 4.57 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] |
1.984 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
650.807 μs (5%) | 1.52 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] |
1.702 ms (5%) | 773.56 KiB (1%) | 26 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] |
2.063 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] |
649.765 μs (5%) | 1.62 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] |
1.728 ms (5%) | 1.51 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
952.462 μs (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] |
3.340 ms (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] |
1.023 ms (5%) | 2.29 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] |
1.215 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] |
3.847 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] |
1.275 ms (5%) | 4.57 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] |
2.624 ms (5%) | 752 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] |
2.996 ms (5%) | 1.05 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] |
4.156 ms (5%) | 3.01 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] |
2.655 ms (5%) | 768 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] |
2.935 ms (5%) | 1.12 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] |
4.336 ms (5%) | 6.02 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] |
2.917 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] |
13.072 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] |
3.205 ms (5%) | 9.07 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
3.608 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] |
15.115 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] |
4.060 ms (5%) | 18.14 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] |
8.218 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
648.695 ms (5%) | 5.48 MiB (1%) | 87447 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] |
7.139 ms (5%) | 3.01 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] |
7.704 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] |
2.797 ms (5%) | 1.62 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] |
7.235 ms (5%) | 6.02 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
2.713 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] |
12.697 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] |
3.176 ms (5%) | 9.07 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] |
3.719 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] |
14.887 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] |
3.808 ms (5%) | 18.14 MiB (1%) | 8 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
368.160 ms (5%) | 368 bytes (1%) | 6 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
389.001 ms (5%) | 848 bytes (1%) | 10 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
890.578 ms (5%) | 190.51 MiB (1%) | 15 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
344.816 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
386.459 ms (5%) | 1.03 KiB (1%) | 10 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
920.980 ms (5%) | 381.02 MiB (1%) | 15 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
593.853 ms (5%) | 100.297 μs | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
2.545 s (5%) | 2.012 ms | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
472.871 ms (5%) | 97.953 μs | 1.65 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
911.482 ms (5%) | 2.440 ms | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
3.063 s (5%) | 2.110 ms | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
782.223 ms (5%) | 2.464 ms | 3.30 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
885.856 ms (5%) | |||
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
418.879 ms (5%) | 1.38 KiB (1%) | 16 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
1.055 s (5%) | 190.51 MiB (1%) | 20 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
861.739 ms (5%) | |||
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
432.791 ms (5%) | 1.67 KiB (1%) | 16 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
1.080 s (5%) | 381.02 MiB (1%) | 20 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
447.592 ms (5%) | 2.089 ms | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
2.377 s (5%) | 1.997 ms | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
476.442 ms (5%) | 150.091 μs | 1.65 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
727.336 ms (5%) | 122.469 μs | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
2.786 s (5%) | 2.214 ms | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
749.129 ms (5%) | 103.945 μs | 3.30 GiB (1%) | 2 |
["nnlib", "dropout", "3-N(128)", "dropout!", "with-colon"] |
178.755 ns (5%) | |||
["nnlib", "dropout", "3-N(128)", "dropout!", "with-dim"] |
166.728 ns (5%) | 576 bytes (1%) | 1 | |
["nnlib", "dropout", "3-N(128)", "dropout", "with-colon"] |
247.154 ns (5%) | 576 bytes (1%) | 1 | |
["nnlib", "dropout", "3-N(128)", "dropout", "with-dim"] |
247.449 ns (5%) | 1.12 KiB (1%) | 2 | |
["nnlib", "dropout", "3-N(256)", "dropout!", "with-colon"] |
284.631 ns (5%) | |||
["nnlib", "dropout", "3-N(256)", "dropout!", "with-dim"] |
215.434 ns (5%) | 1.06 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(256)", "dropout", "with-colon"] |
371.547 ns (5%) | 1.06 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(256)", "dropout", "with-dim"] |
465.959 ns (5%) | 2.12 KiB (1%) | 2 | |
["nnlib", "dropout", "3-N(512)", "dropout!", "with-colon"] |
505.474 ns (5%) | |||
["nnlib", "dropout", "3-N(512)", "dropout!", "with-dim"] |
957.800 ns (5%) | 2.12 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(512)", "dropout", "with-colon"] |
1.241 μs (5%) | 2.12 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(512)", "dropout", "with-dim"] |
1.188 μs (5%) | 4.25 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(128)", "dropout!", "with-colon"] |
5.973 μs (5%) | |||
["nnlib", "dropout", "4-N(128)", "dropout!", "with-dim"] |
3.039 μs (5%) | 672 bytes (1%) | 2 | |
["nnlib", "dropout", "4-N(128)", "dropout", "with-colon"] |
13.666 μs (5%) | 64.11 KiB (1%) | 3 | |
["nnlib", "dropout", "4-N(128)", "dropout", "with-dim"] |
6.893 μs (5%) | 64.77 KiB (1%) | 5 | |
["nnlib", "dropout", "4-N(256)", "dropout!", "with-colon"] |
30.487 μs (5%) | |||
["nnlib", "dropout", "4-N(256)", "dropout!", "with-dim"] |
23.283 μs (5%) | 1.19 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(256)", "dropout", "with-colon"] |
32.221 μs (5%) | 256.11 KiB (1%) | 3 | |
["nnlib", "dropout", "4-N(256)", "dropout", "with-dim"] |
23.254 μs (5%) | 257.30 KiB (1%) | 5 | |
["nnlib", "dropout", "4-N(512)", "dropout!", "with-colon"] |
112.050 μs (5%) | |||
["nnlib", "dropout", "4-N(512)", "dropout!", "with-dim"] |
84.408 μs (5%) | 2.17 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(512)", "dropout", "with-colon"] |
118.633 μs (5%) | 1.00 MiB (1%) | 3 | |
["nnlib", "dropout", "4-N(512)", "dropout", "with-dim"] |
86.713 μs (5%) | 1.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(128)", "dropout!", "with-colon"] |
2.040 ms (5%) | |||
["nnlib", "dropout", "5-N(128)", "dropout!", "with-dim"] |
577.685 μs (5%) | 672 bytes (1%) | 2 | |
["nnlib", "dropout", "5-N(128)", "dropout", "with-colon"] |
2.023 ms (5%) | 8.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(128)", "dropout", "with-dim"] |
570.732 μs (5%) | 8.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(256)", "dropout!", "with-colon"] |
16.167 ms (5%) | |||
["nnlib", "dropout", "5-N(256)", "dropout!", "with-dim"] |
4.402 ms (5%) | 1.19 KiB (1%) | 2 | |
["nnlib", "dropout", "5-N(256)", "dropout", "with-colon"] |
21.120 ms (5%) | 64.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(256)", "dropout", "with-dim"] |
9.012 ms (5%) | 64.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(512)", "dropout!", "with-colon"] |
117.895 ms (5%) | |||
["nnlib", "dropout", "5-N(512)", "dropout!", "with-dim"] |
34.812 ms (5%) | 2.17 KiB (1%) | 2 | |
["nnlib", "dropout", "5-N(512)", "dropout", "with-colon"] |
160.371 ms (5%) | 512.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(512)", "dropout", "with-dim"] |
64.434 ms (5%) | 512.00 MiB (1%) | 5 | |
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(1024)-N(1024)-K(1024)-alpha(0.5)-beta(0.0)"] |
12.143 ms (5%) | |||
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
734.707 μs (5%) | |||
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(80)-N(40)-K(100)-alpha(1.0)-beta(0.0)"] |
86.703 μs (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(1024)-N(1024)-K(1024)-alpha(0.5)-beta(0.0)"] |
24.669 ms (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
1.235 ms (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(80)-N(40)-K(100)-alpha(1.0)-beta(0.0)"] |
101.690 μs (5%) | |||
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct", "data"] |
36.438 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct", "pool"] |
37.309 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct", "data"] |
34.304 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct", "pool"] |
36.939 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct", "data"] |
35.086 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct", "pool"] |
38.130 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct", "data"] |
38.222 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct", "pool"] |
37.961 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct", "data"] |
34.814 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct", "pool"] |
36.458 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct", "data"] |
33.463 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct", "pool"] |
36.708 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct", "data"] |
37.229 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct", "pool"] |
38.201 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct", "data"] |
35.195 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct", "pool"] |
37.239 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct", "data"] |
35.726 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct", "pool"] |
36.739 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct", "data"] |
39.223 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct", "pool"] |
38.362 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct", "data"] |
35.806 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct", "pool"] |
36.167 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct", "data"] |
34.544 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct", "pool"] |
38.191 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct", "data"] |
44.803 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct", "pool"] |
40.715 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct", "data"] |
35.997 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct", "pool"] |
36.308 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct", "data"] |
35.405 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct", "pool"] |
37.260 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct", "data"] |
40.545 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct", "pool"] |
39.794 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct", "data"] |
34.635 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct", "pool"] |
36.658 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct", "data"] |
34.584 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct", "pool"] |
38.131 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct", "data"] |
1.758 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct", "pool"] |
819.936 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct", "data"] |
377.512 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct", "pool"] |
32.119 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "data"] |
116.817 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "pool"] |
31.148 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct", "data"] |
515.959 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct", "pool"] |
229.227 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "data"] |
116.317 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "pool"] |
34.454 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct", "data"] |
61.314 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct", "pool"] |
31.579 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] |
464.363 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] |
210.511 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct", "data"] |
59.511 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct", "pool"] |
59.110 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct", "data"] |
71.944 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct", "pool"] |
44.964 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "data"] |
512.132 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "pool"] |
253.231 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct", "data"] |
131.355 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct", "pool"] |
61.615 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct", "data"] |
79.749 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct", "pool"] |
62.316 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct", "data"] |
29.732 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct", "pool"] |
11.972 ms (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct", "data"] |
2.169 ms (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct", "pool"] |
149.949 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct", "data"] |
3.201 ms (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct", "pool"] |
149.658 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct", "data"] |
7.263 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct", "pool"] |
2.856 ms (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct", "data"] |
665.037 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct", "pool"] |
147.184 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "data"] |
882.270 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "pool"] |
144.469 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] |
1.927 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] |
736.329 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct", "data"] |
182.840 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct", "pool"] |
133.709 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct", "data"] |
265.303 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct", "pool"] |
114.924 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct", "data"] |
868.800 ms (5%) | 400 bytes (1%) | 6 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct", "pool"] |
349.953 ms (5%) | 544 bytes (1%) | 9 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct", "data"] |
102.212 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct", "pool"] |
5.762 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct", "data"] |
79.186 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct", "pool"] |
5.593 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct", "data"] |
125.993 ms (5%) | 400 bytes (1%) | 6 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct", "pool"] |
60.234 ms (5%) | 544 bytes (1%) | 9 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct", "data"] |
4.267 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct", "pool"] |
5.603 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct", "data"] |
11.585 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct", "pool"] |
7.759 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (1024, 2048, 4)] |
262.508 ms (5%) | 16.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (12288, 2048, 1)] |
792.987 ms (5%) | 205.713 μs | 48.00 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (128, 384, 8)] |
12.499 ms (5%) | 774.19 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (2048, 2048, 2)] |
262.576 ms (5%) | 16.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (4096, 2048, 2)] |
528.188 ms (5%) | 32.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (4096, 4096, 2)] |
1.057 s (5%) | 230.108 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (512, 784, 8)] |
100.805 ms (5%) | 6.14 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (768, 1024, 4)] |
98.549 ms (5%) | 6.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (1024, 2048, 4)] |
266.182 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (12288, 2048, 1)] |
805.923 ms (5%) | 12.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (128, 384, 8)] |
13.656 ms (5%) | 18.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (2048, 2048, 2)] |
266.144 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (4096, 2048, 2)] |
532.318 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (4096, 4096, 2)] |
1.064 s (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (512, 784, 8)] |
104.491 ms (5%) | 37.12 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (768, 1024, 4)] |
100.771 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (1024, 2048, 4)] |
59.410 ms (5%) | 32.03 MiB (1%) | 4 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (12288, 2048, 1)] |
179.337 ms (5%) | 266.957 μs | 96.01 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (128, 384, 8)] |
2.700 ms (5%) | 1.51 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (2048, 2048, 2)] |
59.673 ms (5%) | 32.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (4096, 2048, 2)] |
120.601 ms (5%) | 250.756 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (4096, 4096, 2)] |
249.134 ms (5%) | 244.515 μs | 128.03 MiB (1%) | 4 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (512, 784, 8)] |
21.639 ms (5%) | 12.27 MiB (1%) | 4 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (768, 1024, 4)] |
21.225 ms (5%) | 12.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (1024, 2048, 4)] |
61.012 ms (5%) | 96.19 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (12288, 2048, 1)] |
183.197 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (128, 384, 8)] |
2.907 ms (5%) | 36.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (2048, 2048, 2)] |
61.012 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (4096, 2048, 2)] |
121.911 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (4096, 4096, 2)] |
244.278 ms (5%) | 96.19 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (512, 784, 8)] |
23.326 ms (5%) | 73.69 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (768, 1024, 4)] |
22.853 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (1024, 2048, 4)] |
22.997 ms (5%) | 16.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (12288, 2048, 1)] |
73.120 ms (5%) | 85.279 μs | 48.00 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float16", "bw", (128, 384, 8)] |
1.276 ms (5%) | 774.19 KiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (2048, 2048, 2)] |
23.017 ms (5%) | 16.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (4096, 2048, 2)] |
48.449 ms (5%) | 32.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (4096, 4096, 2)] |
99.915 ms (5%) | 3.130 ms | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float16", "bw", (512, 784, 8)] |
9.106 ms (5%) | 6.14 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (768, 1024, 4)] |
8.728 ms (5%) | 6.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (1024, 2048, 4)] |
100.746 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (12288, 2048, 1)] |
309.862 ms (5%) | 4.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (128, 384, 8)] |
6.099 ms (5%) | 6.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (2048, 2048, 2)] |
100.590 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (4096, 2048, 2)] |
201.196 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (4096, 4096, 2)] |
408.681 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (512, 784, 8)] |
41.489 ms (5%) | 12.38 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (768, 1024, 4)] |
38.855 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (1024, 2048, 4)] |
10.662 ms (5%) | 32.03 MiB (1%) | 4 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (12288, 2048, 1)] |
32.031 ms (5%) | 83.456 μs | 96.01 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float32", "bw", (128, 384, 8)] |
444.407 μs (5%) | 1.51 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (2048, 2048, 2)] |
10.614 ms (5%) | 32.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (4096, 2048, 2)] |
20.886 ms (5%) | 85.128 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float32", "bw", (4096, 4096, 2)] |
44.805 ms (5%) | 2.927 ms | 128.03 MiB (1%) | 4 |
["nnlib", "softmax", "softmax", "Float32", "bw", (512, 784, 8)] |
3.331 ms (5%) | 12.27 MiB (1%) | 4 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (768, 1024, 4)] |
3.252 ms (5%) | 12.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (1024, 2048, 4)] |
52.829 ms (5%) | 32.06 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (12288, 2048, 1)] |
160.571 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (128, 384, 8)] |
2.531 ms (5%) | 12.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (2048, 2048, 2)] |
52.884 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (4096, 2048, 2)] |
106.255 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (4096, 4096, 2)] |
224.705 ms (5%) | 32.06 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (512, 784, 8)] |
20.294 ms (5%) | 24.56 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (768, 1024, 4)] |
19.852 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16", "bw"] |
643.462 μs (5%) | 32.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16", "fw"] |
596.464 μs (5%) | 32.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32", "bw"] |
104.247 μs (5%) | 64.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32", "fw"] |
84.830 μs (5%) | 64.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16", "bw"] |
47.520 μs (5%) | 8.59 KiB (1%) | 16 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16", "fw"] |
116.961 μs (5%) | 8.50 KiB (1%) | 13 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "bw"] |
27.111 μs (5%) | 16.59 KiB (1%) | 16 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "fw"] |
28.894 μs (5%) | 16.50 KiB (1%) | 13 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16", "bw"] |
181.823 μs (5%) | 128.53 KiB (1%) | 17 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16", "fw"] |
2.325 ms (5%) | 128.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32", "bw"] |
56.086 μs (5%) | 256.53 KiB (1%) | 17 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32", "fw"] |
294.314 μs (5%) | 256.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16", "bw"] |
2.170 ms (5%) | 128.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16", "fw"] |
3.848 ms (5%) | 128.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32", "bw"] |
327.697 μs (5%) | 256.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32", "fw"] |
453.365 μs (5%) | 256.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16", "bw"] |
43.374 ms (5%) | 256.00 MiB (1%) | 18 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16", "fw"] |
5.849 s (5%) | 269.58 MiB (1%) | 210683 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32", "bw"] |
63.982 ms (5%) | 512.00 MiB (1%) | 18 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32", "fw"] |
702.815 ms (5%) | 512.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float16"] |
5.184 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float32"] |
5.201 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float64"] |
4.776 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float16"] |
3.296 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float32"] |
3.276 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float64"] |
3.409 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float16"] |
2.648 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float32"] |
2.649 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float64"] |
2.654 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float16"] |
177.675 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float32"] |
177.925 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float64"] |
177.735 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float16"] |
82.929 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float32"] |
82.539 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float64"] |
82.715 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float16"] |
655.608 ms (5%) | 1000.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float32"] |
655.763 ms (5%) | 1000.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float64"] |
656.167 ms (5%) | 1000.00 MiB (1%) | 15 |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
-
["flux", "mlp"]
-
["nnlib", "activations", "Float16"]
-
["nnlib", "activations", "Float32"]
-
["nnlib", "activations", "Float64"]
-
["nnlib", "attention", "Float16", "attention"]
-
["nnlib", "attention", "Float16", "score"]
-
["nnlib", "attention", "Float64", "attention"]
-
["nnlib", "attention", "Float64", "score"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "dropout", "3-N(128)", "dropout!"]
-
["nnlib", "dropout", "3-N(128)", "dropout"]
-
["nnlib", "dropout", "3-N(256)", "dropout!"]
-
["nnlib", "dropout", "3-N(256)", "dropout"]
-
["nnlib", "dropout", "3-N(512)", "dropout!"]
-
["nnlib", "dropout", "3-N(512)", "dropout"]
-
["nnlib", "dropout", "4-N(128)", "dropout!"]
-
["nnlib", "dropout", "4-N(128)", "dropout"]
-
["nnlib", "dropout", "4-N(256)", "dropout!"]
-
["nnlib", "dropout", "4-N(256)", "dropout"]
-
["nnlib", "dropout", "4-N(512)", "dropout!"]
-
["nnlib", "dropout", "4-N(512)", "dropout"]
-
["nnlib", "dropout", "5-N(128)", "dropout!"]
-
["nnlib", "dropout", "5-N(128)", "dropout"]
-
["nnlib", "dropout", "5-N(256)", "dropout!"]
-
["nnlib", "dropout", "5-N(256)", "dropout"]
-
["nnlib", "dropout", "5-N(512)", "dropout!"]
-
["nnlib", "dropout", "5-N(512)", "dropout"]
-
["nnlib", "gemm", "Float32", "batched_gemm!"]
-
["nnlib", "gemm", "Float64", "batched_gemm!"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "fw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "fw"]
-
["nnlib", "softmax", "softmax", "Float16", "bw"]
-
["nnlib", "softmax", "softmax", "Float16", "fw"]
-
["nnlib", "softmax", "softmax", "Float32", "bw"]
-
["nnlib", "softmax", "softmax", "Float32", "fw"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32"]
-
["nnlib", "upsample", "nearest", "3-N(128)"]
-
["nnlib", "upsample", "nearest", "3-N(64)"]
-
["nnlib", "upsample", "nearest", "4-N(128)"]
-
["nnlib", "upsample", "nearest", "4-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(64)"]
Julia versioninfo
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 22.04.4 LTS
uname: Linux 6.5.0-1018-azure #19~22.04.2-Ubuntu SMP Thu Mar 21 16:45:46 UTC 2024 x86_64 x86_64
CPU: AMD EPYC 7763 64-Core Processor:
speed user nice sys idle irq
#1 2585 MHz 276 s 0 s 88 s 2928 s 0 s
#2 3086 MHz 342 s 0 s 113 s 2844 s 0 s
#3 3242 MHz 555 s 0 s 86 s 2638 s 0 s
#4 3217 MHz 262 s 0 s 77 s 2951 s 0 s
Memory: 15.606494903564453 GB (13991.60546875 MB free)
Uptime: 331.65 sec
Load Avg: 1.0 0.44 0.18
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Baseline result
Benchmark Report for /home/runner/work/FluxMLBenchmarks.jl/FluxMLBenchmarks.jl/benchmark/script/..
Job Properties
- Time of benchmark: 30 Apr 2024 - 4:30
- Package commit: non gi
- Julia commit: bd47ec
- Julia command flags: None
- Environment variables:
FLUXML_BENCHMARK_FLUX_MLP => true
FLUXML_BENCHMARK_FLUX => true
JULIA_NUM_THREADS => 1
Results
Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID
column have the structure [parent_group, child_group, ..., key]
, and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.
ID | time | GC time | memory | allocations |
---|---|---|---|---|
["flux", "mlp", "Float16"] |
354.751 μs (5%) | 23.30 KiB (1%) | 8 | |
["flux", "mlp", "Float32"] |
334.153 μs (5%) | 3.25 KiB (1%) | 6 | |
["flux", "mlp", "Float64"] |
344.062 μs (5%) | 23.30 KiB (1%) | 8 | |
["nnlib", "activations", "Float16", "celu"] |
27.444 ms (5%) | |||
["nnlib", "activations", "Float16", "elu"] |
27.441 ms (5%) | |||
["nnlib", "activations", "Float16", "gelu"] |
78.923 ms (5%) | |||
["nnlib", "activations", "Float16", "hardswish"] |
610.774 μs (5%) | |||
["nnlib", "activations", "Float16", "hardtanh"] |
159.298 μs (5%) | |||
["nnlib", "activations", "Float16", "hardσ"] |
393.508 μs (5%) | |||
["nnlib", "activations", "Float16", "leakyrelu"] |
160.952 μs (5%) | |||
["nnlib", "activations", "Float16", "lisht"] |
9.189 ms (5%) | |||
["nnlib", "activations", "Float16", "logcosh"] |
36.243 ms (5%) | |||
["nnlib", "activations", "Float16", "logσ"] |
32.048 ms (5%) | |||
["nnlib", "activations", "Float16", "mish"] |
98.842 ms (5%) | |||
["nnlib", "activations", "Float16", "relu"] |
153.358 μs (5%) | |||
["nnlib", "activations", "Float16", "relu6"] |
161.873 μs (5%) | |||
["nnlib", "activations", "Float16", "rrelu"] |
4.178 ms (5%) | |||
["nnlib", "activations", "Float16", "selu"] |
37.225 ms (5%) | |||
["nnlib", "activations", "Float16", "sigmoid_fast"] |
14.166 ms (5%) | |||
["nnlib", "activations", "Float16", "softplus"] |
27.819 ms (5%) | |||
["nnlib", "activations", "Float16", "softshrink"] |
295.664 μs (5%) | |||
["nnlib", "activations", "Float16", "softsign"] |
1.559 ms (5%) | |||
["nnlib", "activations", "Float16", "swish"] |
15.890 ms (5%) | |||
["nnlib", "activations", "Float16", "tanh_fast"] |
8.519 ms (5%) | |||
["nnlib", "activations", "Float16", "tanhshrink"] |
9.127 ms (5%) | |||
["nnlib", "activations", "Float16", "trelu"] |
153.948 μs (5%) | |||
["nnlib", "activations", "Float16", "σ"] |
14.199 ms (5%) | |||
["nnlib", "activations", "Float32", "celu"] |
6.074 ms (5%) | |||
["nnlib", "activations", "Float32", "elu"] |
5.222 ms (5%) | |||
["nnlib", "activations", "Float32", "gelu"] |
9.418 ms (5%) | |||
["nnlib", "activations", "Float32", "hardswish"] |
282.790 μs (5%) | |||
["nnlib", "activations", "Float32", "hardtanh"] |
292.168 μs (5%) | |||
["nnlib", "activations", "Float32", "hardσ"] |
287.769 μs (5%) | |||
["nnlib", "activations", "Float32", "leakyrelu"] |
287.359 μs (5%) | |||
["nnlib", "activations", "Float32", "lisht"] |
395.982 μs (5%) | |||
["nnlib", "activations", "Float32", "logcosh"] |
17.206 ms (5%) | |||
["nnlib", "activations", "Float32", "logσ"] |
16.892 ms (5%) | |||
["nnlib", "activations", "Float32", "mish"] |
35.842 ms (5%) | |||
["nnlib", "activations", "Float32", "relu"] |
322.625 μs (5%) | |||
["nnlib", "activations", "Float32", "relu6"] |
288.981 μs (5%) | |||
["nnlib", "activations", "Float32", "rrelu"] |
1.312 ms (5%) | |||
["nnlib", "activations", "Float32", "selu"] |
5.616 ms (5%) | |||
["nnlib", "activations", "Float32", "sigmoid_fast"] |
6.252 ms (5%) | |||
["nnlib", "activations", "Float32", "softplus"] |
16.619 ms (5%) | |||
["nnlib", "activations", "Float32", "softshrink"] |
286.658 μs (5%) | |||
["nnlib", "activations", "Float32", "softsign"] |
288.050 μs (5%) | |||
["nnlib", "activations", "Float32", "swish"] |
6.865 ms (5%) | |||
["nnlib", "activations", "Float32", "tanh_fast"] |
375.774 μs (5%) | |||
["nnlib", "activations", "Float32", "tanhshrink"] |
375.433 μs (5%) | |||
["nnlib", "activations", "Float32", "trelu"] |
313.127 μs (5%) | |||
["nnlib", "activations", "Float32", "σ"] |
6.853 ms (5%) | |||
["nnlib", "activations", "Float64", "celu"] |
5.027 ms (5%) | |||
["nnlib", "activations", "Float64", "elu"] |
5.388 ms (5%) | |||
["nnlib", "activations", "Float64", "gelu"] |
9.553 ms (5%) | |||
["nnlib", "activations", "Float64", "hardswish"] |
546.565 μs (5%) | |||
["nnlib", "activations", "Float64", "hardtanh"] |
558.587 μs (5%) | |||
["nnlib", "activations", "Float64", "hardσ"] |
529.422 μs (5%) | |||
["nnlib", "activations", "Float64", "leakyrelu"] |
547.596 μs (5%) | |||
["nnlib", "activations", "Float64", "lisht"] |
9.332 ms (5%) | |||
["nnlib", "activations", "Float64", "logcosh"] |
17.789 ms (5%) | |||
["nnlib", "activations", "Float64", "logσ"] |
17.232 ms (5%) | |||
["nnlib", "activations", "Float64", "mish"] |
35.338 ms (5%) | |||
["nnlib", "activations", "Float64", "relu"] |
583.553 μs (5%) | |||
["nnlib", "activations", "Float64", "relu6"] |
553.928 μs (5%) | |||
["nnlib", "activations", "Float64", "rrelu"] |
1.338 ms (5%) | |||
["nnlib", "activations", "Float64", "selu"] |
6.000 ms (5%) | |||
["nnlib", "activations", "Float64", "sigmoid_fast"] |
6.742 ms (5%) | |||
["nnlib", "activations", "Float64", "softplus"] |
16.666 ms (5%) | |||
["nnlib", "activations", "Float64", "softshrink"] |
542.436 μs (5%) | |||
["nnlib", "activations", "Float64", "softsign"] |
541.094 μs (5%) | |||
["nnlib", "activations", "Float64", "swish"] |
7.169 ms (5%) | |||
["nnlib", "activations", "Float64", "tanh_fast"] |
9.038 ms (5%) | |||
["nnlib", "activations", "Float64", "tanhshrink"] |
9.207 ms (5%) | |||
["nnlib", "activations", "Float64", "trelu"] |
555.872 μs (5%) | |||
["nnlib", "activations", "Float64", "σ"] |
5.925 ms (5%) | |||
["nnlib", "attention", "Float16", "attention", "q((16, 128, 8))-k((16, 512, 8))-v((32, 512, 8))-bias((512, 128))-nheads(4)"] |
151.710 ms (5%) | 14.59 MiB (1%) | 881 | |
["nnlib", "attention", "Float16", "attention", "q((64, 64, 16))-k((64, 64, 16))-v((64, 64, 16))-bias((64, 64))-nheads(4)"] |
40.733 ms (5%) | 4.97 MiB (1%) | 1585 | |
["nnlib", "attention", "Float16", "attention", "q((8, 6, 1))-k((8, 10, 1))-v((4, 10, 1))-bias(nothing)-nheads(1)"] |
39.123 μs (5%) | 46.12 KiB (1%) | 62 | |
["nnlib", "attention", "Float16", "score", "q(8, (16, 128, 8))-k(8, (16, 512, 8))-bias((512, 128))-nheads(4)"] |
405.344 ms (5%) | 318.056 μs | 69.56 MiB (1%) | 1693 |
["nnlib", "attention", "Float16", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
159.655 ms (5%) | 56.78 MiB (1%) | 12317 | |
["nnlib", "attention", "Float16", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
61.115 μs (5%) | 179.16 KiB (1%) | 113 | |
["nnlib", "attention", "Float64", "attention", "q((16, 128, 8))-k((16, 512, 8))-v((32, 512, 8))-bias((512, 128))-nheads(4)"] |
24.168 ms (5%) | 50.28 MiB (1%) | 50 | |
["nnlib", "attention", "Float64", "attention", "q((64, 64, 16))-k((64, 64, 16))-v((64, 64, 16))-bias((64, 64))-nheads(4)"] |
3.032 ms (5%) | 9.03 MiB (1%) | 50 | |
["nnlib", "attention", "Float64", "attention", "q((8, 6, 1))-k((8, 10, 1))-v((4, 10, 1))-bias(nothing)-nheads(1)"] |
7.783 μs (5%) | 5.30 KiB (1%) | 38 | |
["nnlib", "attention", "Float64", "score", "q(8, (16, 128, 8))-k(8, (16, 512, 8))-bias((512, 128))-nheads(4)"] |
135.536 ms (5%) | 1.335 ms | 262.13 MiB (1%) | 29 |
["nnlib", "attention", "Float64", "score", "q(8, (64, 64, 16))-k(8, (64, 64, 16))-bias((64, 64))-nheads(4)"] |
48.413 ms (5%) | 1.216 ms | 140.50 MiB (1%) | 29 |
["nnlib", "attention", "Float64", "score", "q(8, (8, 6, 1))-k(8, (8, 10, 1))-bias(nothing)-nheads(1)"] |
31.079 μs (5%) | 20.16 KiB (1%) | 17 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
47.208 μs (5%) | 2.75 KiB (1%) | 47 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
53.009 μs (5%) | 3.03 KiB (1%) | 51 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
52.940 μs (5%) | 6.23 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
44.323 μs (5%) | 2.77 KiB (1%) | 47 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
50.184 μs (5%) | 3.05 KiB (1%) | 51 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
57.998 μs (5%) | 9.28 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
71.894 μs (5%) | 6.67 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
83.837 μs (5%) | 6.67 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
54.201 μs (5%) | 5.52 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
76.353 μs (5%) | 9.69 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
84.157 μs (5%) | 9.69 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
55.814 μs (5%) | 8.52 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
47.629 μs (5%) | 2.34 KiB (1%) | 41 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
55.584 μs (5%) | 3.44 KiB (1%) | 57 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
60.414 μs (5%) | 6.55 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
45.756 μs (5%) | 2.34 KiB (1%) | 41 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
54.242 μs (5%) | 3.45 KiB (1%) | 57 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
64.811 μs (5%) | 9.59 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
80.250 μs (5%) | 6.80 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
84.529 μs (5%) | 6.62 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
56.005 μs (5%) | 5.47 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
84.969 μs (5%) | 9.80 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
87.484 μs (5%) | 9.62 KiB (1%) | 56 | |
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
62.477 μs (5%) | 8.47 KiB (1%) | 42 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
49.333 μs (5%) | 2.78 KiB (1%) | 49 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
52.959 μs (5%) | 3.06 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
60.093 μs (5%) | 9.30 KiB (1%) | 55 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
50.604 μs (5%) | 2.80 KiB (1%) | 49 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
53.270 μs (5%) | 3.08 KiB (1%) | 53 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
66.905 μs (5%) | 15.31 KiB (1%) | 55 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
80.291 μs (5%) | 9.70 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
90.810 μs (5%) | 9.70 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
55.324 μs (5%) | 8.55 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
80.691 μs (5%) | 15.72 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
86.072 μs (5%) | 15.72 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
53.150 μs (5%) | 14.55 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
53.150 μs (5%) | 2.38 KiB (1%) | 43 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
54.703 μs (5%) | 3.47 KiB (1%) | 59 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
66.385 μs (5%) | 9.61 KiB (1%) | 60 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
52.959 μs (5%) | 2.38 KiB (1%) | 43 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
56.486 μs (5%) | 3.48 KiB (1%) | 59 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
72.065 μs (5%) | 15.62 KiB (1%) | 60 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
83.837 μs (5%) | 9.83 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
85.470 μs (5%) | 9.66 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
57.077 μs (5%) | 8.50 KiB (1%) | 44 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
79.930 μs (5%) | 15.83 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
86.995 μs (5%) | 15.66 KiB (1%) | 58 | |
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
56.717 μs (5%) | 14.50 KiB (1%) | 44 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] |
694.140 μs (5%) | 752 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] |
778.138 μs (5%) | 1.05 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] |
1.082 ms (5%) | 773.17 KiB (1%) | 21 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] |
693.810 μs (5%) | 768 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] |
775.723 μs (5%) | 1.12 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] |
1.103 ms (5%) | 1.51 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] |
834.112 μs (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] |
3.303 ms (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] |
1.047 ms (5%) | 2.29 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
1.230 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] |
3.870 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] |
1.226 ms (5%) | 4.57 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] |
2.312 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
740.988 μs (5%) | 1.52 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] |
1.740 ms (5%) | 773.56 KiB (1%) | 26 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] |
2.339 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] |
783.578 μs (5%) | 1.62 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] |
1.749 ms (5%) | 1.51 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
828.212 μs (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] |
3.344 ms (5%) | 2.29 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] |
1.050 ms (5%) | 2.29 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] |
1.163 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] |
3.904 ms (5%) | 4.57 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] |
1.205 ms (5%) | 4.57 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "conv"] |
2.719 ms (5%) | 752 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "data"] |
2.972 ms (5%) | 1.05 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32", "filter"] |
4.269 ms (5%) | 3.01 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "conv"] |
2.696 ms (5%) | 768 bytes (1%) | 12 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "data"] |
3.061 ms (5%) | 1.12 KiB (1%) | 16 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64", "filter"] |
4.415 ms (5%) | 6.02 MiB (1%) | 21 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "conv"] |
2.642 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "data"] |
12.750 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32", "filter"] |
3.189 ms (5%) | 9.07 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "conv"] |
2.990 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "data"] |
14.600 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64", "filter"] |
3.871 ms (5%) | 18.14 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "conv"] |
8.041 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "data"] |
2.751 ms (5%) | 1.52 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32", "filter"] |
6.785 ms (5%) | 3.01 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "conv"] |
7.963 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "data"] |
2.673 ms (5%) | 1.62 KiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64", "filter"] |
6.848 ms (5%) | 6.02 MiB (1%) | 26 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "conv"] |
2.469 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "data"] |
12.876 ms (5%) | 9.07 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32", "filter"] |
3.167 ms (5%) | 9.07 MiB (1%) | 8 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "conv"] |
3.417 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "data"] |
14.698 ms (5%) | 18.14 MiB (1%) | 22 | |
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64", "filter"] |
3.834 ms (5%) | 18.14 MiB (1%) | 8 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "conv"] |
371.146 ms (5%) | 368 bytes (1%) | 6 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "data"] |
391.440 ms (5%) | 848 bytes (1%) | 10 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32", "filter"] |
891.161 ms (5%) | 190.51 MiB (1%) | 15 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "conv"] |
349.546 ms (5%) | 384 bytes (1%) | 6 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "data"] |
389.363 ms (5%) | 1.03 KiB (1%) | 10 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64", "filter"] |
924.378 ms (5%) | 381.02 MiB (1%) | 15 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "conv"] |
447.758 ms (5%) | 92.312 μs | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "data"] |
2.413 s (5%) | 120.015 μs | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32", "filter"] |
483.597 ms (5%) | 94.397 μs | 1.65 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "conv"] |
713.125 ms (5%) | 96.641 μs | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "data"] |
2.756 s (5%) | 94.748 μs | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64", "filter"] |
754.267 ms (5%) | 101.641 μs | 3.30 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "conv"] |
885.789 ms (5%) | |||
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "data"] |
427.243 ms (5%) | 1.38 KiB (1%) | 16 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32", "filter"] |
1.054 s (5%) | 190.51 MiB (1%) | 20 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "conv"] |
863.230 ms (5%) | |||
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "data"] |
431.781 ms (5%) | 1.67 KiB (1%) | 16 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64", "filter"] |
1.080 s (5%) | 381.02 MiB (1%) | 20 | |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "conv"] |
457.099 ms (5%) | 100.097 μs | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "data"] |
2.414 s (5%) | 100.538 μs | 1.65 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32", "filter"] |
482.964 ms (5%) | 93.766 μs | 1.65 GiB (1%) | 2 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "conv"] |
715.143 ms (5%) | 131.526 μs | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "data"] |
2.774 s (5%) | 123.271 μs | 3.30 GiB (1%) | 16 |
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64", "filter"] |
743.701 ms (5%) | 102.011 μs | 3.30 GiB (1%) | 2 |
["nnlib", "dropout", "3-N(128)", "dropout!", "with-colon"] |
186.526 ns (5%) | |||
["nnlib", "dropout", "3-N(128)", "dropout!", "with-dim"] |
171.462 ns (5%) | 576 bytes (1%) | 1 | |
["nnlib", "dropout", "3-N(128)", "dropout", "with-colon"] |
239.148 ns (5%) | 576 bytes (1%) | 1 | |
["nnlib", "dropout", "3-N(128)", "dropout", "with-dim"] |
388.808 ns (5%) | 1.12 KiB (1%) | 2 | |
["nnlib", "dropout", "3-N(256)", "dropout!", "with-colon"] |
291.915 ns (5%) | |||
["nnlib", "dropout", "3-N(256)", "dropout!", "with-dim"] |
555.447 ns (5%) | 1.06 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(256)", "dropout", "with-colon"] |
660.104 ns (5%) | 1.06 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(256)", "dropout", "with-dim"] |
945.265 ns (5%) | 2.12 KiB (1%) | 2 | |
["nnlib", "dropout", "3-N(512)", "dropout!", "with-colon"] |
510.221 ns (5%) | |||
["nnlib", "dropout", "3-N(512)", "dropout!", "with-dim"] |
1.234 μs (5%) | 2.12 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(512)", "dropout", "with-colon"] |
1.535 μs (5%) | 2.12 KiB (1%) | 1 | |
["nnlib", "dropout", "3-N(512)", "dropout", "with-dim"] |
1.853 μs (5%) | 4.25 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(128)", "dropout!", "with-colon"] |
5.914 μs (5%) | |||
["nnlib", "dropout", "4-N(128)", "dropout!", "with-dim"] |
3.160 μs (5%) | 672 bytes (1%) | 2 | |
["nnlib", "dropout", "4-N(128)", "dropout", "with-colon"] |
11.351 μs (5%) | 64.11 KiB (1%) | 3 | |
["nnlib", "dropout", "4-N(128)", "dropout", "with-dim"] |
7.407 μs (5%) | 64.77 KiB (1%) | 5 | |
["nnlib", "dropout", "4-N(256)", "dropout!", "with-colon"] |
29.355 μs (5%) | |||
["nnlib", "dropout", "4-N(256)", "dropout!", "with-dim"] |
22.432 μs (5%) | 1.19 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(256)", "dropout", "with-colon"] |
33.102 μs (5%) | 256.11 KiB (1%) | 3 | |
["nnlib", "dropout", "4-N(256)", "dropout", "with-dim"] |
25.338 μs (5%) | 257.30 KiB (1%) | 5 | |
["nnlib", "dropout", "4-N(512)", "dropout!", "with-colon"] |
112.742 μs (5%) | |||
["nnlib", "dropout", "4-N(512)", "dropout!", "with-dim"] |
89.408 μs (5%) | 2.17 KiB (1%) | 2 | |
["nnlib", "dropout", "4-N(512)", "dropout", "with-colon"] |
116.449 μs (5%) | 1.00 MiB (1%) | 3 | |
["nnlib", "dropout", "4-N(512)", "dropout", "with-dim"] |
90.931 μs (5%) | 1.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(128)", "dropout!", "with-colon"] |
2.084 ms (5%) | |||
["nnlib", "dropout", "5-N(128)", "dropout!", "with-dim"] |
557.839 μs (5%) | 672 bytes (1%) | 2 | |
["nnlib", "dropout", "5-N(128)", "dropout", "with-colon"] |
2.119 ms (5%) | 8.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(128)", "dropout", "with-dim"] |
571.845 μs (5%) | 8.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(256)", "dropout!", "with-colon"] |
15.530 ms (5%) | |||
["nnlib", "dropout", "5-N(256)", "dropout!", "with-dim"] |
4.372 ms (5%) | 1.19 KiB (1%) | 2 | |
["nnlib", "dropout", "5-N(256)", "dropout", "with-colon"] |
21.160 ms (5%) | 64.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(256)", "dropout", "with-dim"] |
9.017 ms (5%) | 64.00 MiB (1%) | 5 | |
["nnlib", "dropout", "5-N(512)", "dropout!", "with-colon"] |
119.347 ms (5%) | |||
["nnlib", "dropout", "5-N(512)", "dropout!", "with-dim"] |
34.429 ms (5%) | 2.17 KiB (1%) | 2 | |
["nnlib", "dropout", "5-N(512)", "dropout", "with-colon"] |
163.244 ms (5%) | 512.00 MiB (1%) | 3 | |
["nnlib", "dropout", "5-N(512)", "dropout", "with-dim"] |
66.339 ms (5%) | 512.00 MiB (1%) | 5 | |
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(1024)-N(1024)-K(1024)-alpha(0.5)-beta(0.0)"] |
12.270 ms (5%) | |||
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
907.440 μs (5%) | |||
["nnlib", "gemm", "Float32", "batched_gemm!", "trans(N,N)-M(80)-N(40)-K(100)-alpha(1.0)-beta(0.0)"] |
86.162 μs (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(1024)-N(1024)-K(1024)-alpha(0.5)-beta(0.0)"] |
25.005 ms (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(512)-N(512)-K(128)-alpha(0.5)-beta(1.0)"] |
1.404 ms (5%) | |||
["nnlib", "gemm", "Float64", "batched_gemm!", "trans(N,N)-M(80)-N(40)-K(100)-alpha(1.0)-beta(0.0)"] |
110.627 μs (5%) | |||
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct", "data"] |
37.400 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct", "pool"] |
39.053 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct", "data"] |
36.588 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct", "pool"] |
37.640 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct", "data"] |
35.837 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct", "pool"] |
38.050 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct", "data"] |
38.852 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct", "pool"] |
38.732 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct", "data"] |
34.775 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct", "pool"] |
34.985 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct", "data"] |
34.845 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct", "pool"] |
36.719 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct", "data"] |
38.472 μs (5%) | 2.86 KiB (1%) | 52 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct", "pool"] |
37.910 μs (5%) | 2.50 KiB (1%) | 47 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct", "data"] |
35.335 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct", "pool"] |
35.546 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct", "data"] |
35.345 μs (5%) | 2.80 KiB (1%) | 48 | |
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct", "pool"] |
36.918 μs (5%) | 2.45 KiB (1%) | 44 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct", "data"] |
37.840 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct", "pool"] |
39.763 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct", "data"] |
35.807 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct", "pool"] |
37.229 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct", "data"] |
33.933 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct", "pool"] |
37.550 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct", "data"] |
45.725 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct", "pool"] |
40.786 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct", "data"] |
37.029 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct", "pool"] |
36.588 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct", "data"] |
35.697 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct", "pool"] |
36.017 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct", "data"] |
39.894 μs (5%) | 2.89 KiB (1%) | 54 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct", "pool"] |
39.464 μs (5%) | 2.53 KiB (1%) | 49 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct", "data"] |
35.807 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct", "pool"] |
35.727 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct", "data"] |
35.787 μs (5%) | 2.83 KiB (1%) | 50 | |
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct", "pool"] |
36.608 μs (5%) | 2.48 KiB (1%) | 46 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct", "data"] |
1.738 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct", "pool"] |
808.714 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct", "data"] |
370.720 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct", "pool"] |
46.807 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "data"] |
107.600 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct", "pool"] |
46.306 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct", "data"] |
517.793 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct", "pool"] |
224.778 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "data"] |
127.808 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct", "pool"] |
37.359 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct", "data"] |
63.709 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct", "pool"] |
34.424 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] |
469.974 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] |
193.250 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct", "data"] |
58.428 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct", "pool"] |
57.106 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct", "data"] |
72.085 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct", "pool"] |
45.806 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "data"] |
541.657 μs (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct", "pool"] |
239.285 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct", "data"] |
144.658 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct", "pool"] |
61.364 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct", "data"] |
84.046 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct", "pool"] |
63.999 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct", "data"] |
29.509 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct", "pool"] |
11.522 ms (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct", "data"] |
2.223 ms (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct", "pool"] |
153.736 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct", "data"] |
2.620 ms (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct", "pool"] |
152.804 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct", "data"] |
7.758 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct", "pool"] |
2.828 ms (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct", "data"] |
674.755 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct", "pool"] |
112.159 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "data"] |
828.000 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct", "pool"] |
118.671 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct", "data"] |
2.127 ms (5%) | 864 bytes (1%) | 14 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct", "pool"] |
733.223 μs (5%) | 752 bytes (1%) | 13 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct", "data"] |
201.695 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct", "pool"] |
137.736 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct", "data"] |
261.617 μs (5%) | 816 bytes (1%) | 11 | |
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct", "pool"] |
123.730 μs (5%) | 720 bytes (1%) | 11 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct", "data"] |
869.184 ms (5%) | 400 bytes (1%) | 6 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct", "pool"] |
342.451 ms (5%) | 544 bytes (1%) | 9 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct", "data"] |
101.848 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct", "pool"] |
5.757 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct", "data"] |
81.848 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct", "pool"] |
5.573 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct", "data"] |
133.246 ms (5%) | 400 bytes (1%) | 6 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct", "pool"] |
60.141 ms (5%) | 544 bytes (1%) | 9 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct", "data"] |
4.497 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct", "pool"] |
5.632 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct", "data"] |
11.270 ms (5%) | 352 bytes (1%) | 3 | |
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct", "pool"] |
7.663 ms (5%) | 512 bytes (1%) | 7 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (1024, 2048, 4)] |
262.519 ms (5%) | 16.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (12288, 2048, 1)] |
793.305 ms (5%) | 231.650 μs | 48.00 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (128, 384, 8)] |
12.500 ms (5%) | 774.19 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (2048, 2048, 2)] |
262.489 ms (5%) | 16.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (4096, 2048, 2)] |
528.382 ms (5%) | 32.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (4096, 4096, 2)] |
1.056 s (5%) | 179.263 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (512, 784, 8)] |
100.820 ms (5%) | 6.14 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "bw", (768, 1024, 4)] |
98.562 ms (5%) | 6.01 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (1024, 2048, 4)] |
266.208 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (12288, 2048, 1)] |
805.607 ms (5%) | 12.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (128, 384, 8)] |
13.631 ms (5%) | 18.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (2048, 2048, 2)] |
266.262 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (4096, 2048, 2)] |
532.426 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (4096, 4096, 2)] |
1.064 s (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (512, 784, 8)] |
104.557 ms (5%) | 37.12 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float16", "fw", (768, 1024, 4)] |
100.779 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (1024, 2048, 4)] |
64.423 ms (5%) | 32.03 MiB (1%) | 4 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (12288, 2048, 1)] |
192.992 ms (5%) | 243.133 μs | 96.01 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (128, 384, 8)] |
2.924 ms (5%) | 1.51 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (2048, 2048, 2)] |
64.298 ms (5%) | 32.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (4096, 2048, 2)] |
128.889 ms (5%) | 235.027 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (4096, 4096, 2)] |
260.497 ms (5%) | 178.382 μs | 128.03 MiB (1%) | 4 |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (512, 784, 8)] |
23.436 ms (5%) | 12.27 MiB (1%) | 4 | |
["nnlib", "softmax", "logsoftmax", "Float32", "bw", (768, 1024, 4)] |
22.953 ms (5%) | 12.02 MiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (1024, 2048, 4)] |
61.181 ms (5%) | 96.19 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (12288, 2048, 1)] |
183.619 ms (5%) | 24.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (128, 384, 8)] |
2.919 ms (5%) | 36.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (2048, 2048, 2)] |
61.046 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (4096, 2048, 2)] |
121.825 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (4096, 4096, 2)] |
244.062 ms (5%) | 96.19 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (512, 784, 8)] |
23.356 ms (5%) | 73.69 KiB (1%) | 6 | |
["nnlib", "softmax", "logsoftmax", "Float32", "fw", (768, 1024, 4)] |
22.895 ms (5%) | 48.38 KiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (1024, 2048, 4)] |
22.930 ms (5%) | 16.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (12288, 2048, 1)] |
73.175 ms (5%) | 138.598 μs | 48.00 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float16", "bw", (128, 384, 8)] |
1.271 ms (5%) | 774.19 KiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (2048, 2048, 2)] |
22.930 ms (5%) | 16.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (4096, 2048, 2)] |
48.527 ms (5%) | 32.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (4096, 4096, 2)] |
96.255 ms (5%) | 141.042 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float16", "bw", (512, 784, 8)] |
9.100 ms (5%) | 6.14 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "bw", (768, 1024, 4)] |
8.711 ms (5%) | 6.01 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (1024, 2048, 4)] |
100.688 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (12288, 2048, 1)] |
309.871 ms (5%) | 4.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (128, 384, 8)] |
6.076 ms (5%) | 6.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (2048, 2048, 2)] |
100.572 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (4096, 2048, 2)] |
200.720 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (4096, 4096, 2)] |
402.617 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (512, 784, 8)] |
41.431 ms (5%) | 12.38 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float16", "fw", (768, 1024, 4)] |
38.819 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (1024, 2048, 4)] |
10.859 ms (5%) | 32.03 MiB (1%) | 4 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (12288, 2048, 1)] |
31.889 ms (5%) | 87.102 μs | 96.01 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float32", "bw", (128, 384, 8)] |
454.716 μs (5%) | 1.51 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (2048, 2048, 2)] |
10.712 ms (5%) | 32.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (4096, 2048, 2)] |
21.233 ms (5%) | 101.158 μs | 64.02 MiB (1%) | 3 |
["nnlib", "softmax", "softmax", "Float32", "bw", (4096, 4096, 2)] |
41.081 ms (5%) | 87.463 μs | 128.03 MiB (1%) | 4 |
["nnlib", "softmax", "softmax", "Float32", "bw", (512, 784, 8)] |
3.263 ms (5%) | 12.27 MiB (1%) | 4 | |
["nnlib", "softmax", "softmax", "Float32", "bw", (768, 1024, 4)] |
3.237 ms (5%) | 12.02 MiB (1%) | 3 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (1024, 2048, 4)] |
52.962 ms (5%) | 32.06 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (12288, 2048, 1)] |
160.797 ms (5%) | 8.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (128, 384, 8)] |
2.554 ms (5%) | 12.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (2048, 2048, 2)] |
53.737 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (4096, 2048, 2)] |
107.216 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (4096, 4096, 2)] |
224.597 ms (5%) | 32.06 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (512, 784, 8)] |
20.298 ms (5%) | 24.56 KiB (1%) | 2 | |
["nnlib", "softmax", "softmax", "Float32", "fw", (768, 1024, 4)] |
19.901 ms (5%) | 16.12 KiB (1%) | 1 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16", "bw"] |
641.104 μs (5%) | 32.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16", "fw"] |
595.178 μs (5%) | 32.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32", "bw"] |
105.338 μs (5%) | 64.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32", "fw"] |
84.779 μs (5%) | 64.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16", "bw"] |
48.281 μs (5%) | 8.59 KiB (1%) | 16 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16", "fw"] |
117.641 μs (5%) | 8.50 KiB (1%) | 13 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "bw"] |
23.504 μs (5%) | 16.59 KiB (1%) | 16 | |
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32", "fw"] |
24.486 μs (5%) | 16.50 KiB (1%) | 13 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16", "bw"] |
182.063 μs (5%) | 128.53 KiB (1%) | 17 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16", "fw"] |
2.326 ms (5%) | 128.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32", "bw"] |
56.576 μs (5%) | 256.53 KiB (1%) | 17 | |
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32", "fw"] |
294.203 μs (5%) | 256.44 KiB (1%) | 14 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16", "bw"] |
2.171 ms (5%) | 128.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16", "fw"] |
3.833 ms (5%) | 128.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32", "bw"] |
329.429 μs (5%) | 256.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32", "fw"] |
453.292 μs (5%) | 256.47 KiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16", "bw"] |
43.463 ms (5%) | 256.00 MiB (1%) | 18 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16", "fw"] |
5.696 s (5%) | 256.00 MiB (1%) | 15 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32", "bw"] |
64.077 ms (5%) | 512.00 MiB (1%) | 18 | |
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32", "fw"] |
702.800 ms (5%) | 512.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float16"] |
5.137 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float32"] |
5.097 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(128)", "Float64"] |
5.144 μs (5%) | 5.67 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float16"] |
3.659 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float32"] |
3.676 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "3-N(64)", "Float64"] |
3.641 μs (5%) | 3.17 KiB (1%) | 11 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float16"] |
2.647 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float32"] |
2.634 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(128)", "Float64"] |
2.641 ms (5%) | 6.25 MiB (1%) | 14 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float16"] |
177.644 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float32"] |
178.235 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "4-N(32)", "Float64"] |
179.107 μs (5%) | 400.77 KiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float16"] |
82.230 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float32"] |
82.318 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(32)", "Float64"] |
82.108 ms (5%) | 125.00 MiB (1%) | 12 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float16"] |
655.682 ms (5%) | 1000.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float32"] |
654.974 ms (5%) | 1000.00 MiB (1%) | 15 | |
["nnlib", "upsample", "nearest", "5-N(64)", "Float64"] |
654.964 ms (5%) | 1000.00 MiB (1%) | 15 |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
-
["flux", "mlp"]
-
["nnlib", "activations", "Float16"]
-
["nnlib", "activations", "Float32"]
-
["nnlib", "activations", "Float64"]
-
["nnlib", "attention", "Float16", "attention"]
-
["nnlib", "attention", "Float16", "score"]
-
["nnlib", "attention", "Float64", "attention"]
-
["nnlib", "attention", "Float64", "score"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "3-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "4-N(512)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(2)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "conv_im2col", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_direct", "Float64"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float32"]
-
["nnlib", "conv", "5-N(256)-K(3)-in(1)-out(1)-stride(1)-dilation(1)-padding(0)", "depthwiseconv_im2col", "Float64"]
-
["nnlib", "dropout", "3-N(128)", "dropout!"]
-
["nnlib", "dropout", "3-N(128)", "dropout"]
-
["nnlib", "dropout", "3-N(256)", "dropout!"]
-
["nnlib", "dropout", "3-N(256)", "dropout"]
-
["nnlib", "dropout", "3-N(512)", "dropout!"]
-
["nnlib", "dropout", "3-N(512)", "dropout"]
-
["nnlib", "dropout", "4-N(128)", "dropout!"]
-
["nnlib", "dropout", "4-N(128)", "dropout"]
-
["nnlib", "dropout", "4-N(256)", "dropout!"]
-
["nnlib", "dropout", "4-N(256)", "dropout"]
-
["nnlib", "dropout", "4-N(512)", "dropout!"]
-
["nnlib", "dropout", "4-N(512)", "dropout"]
-
["nnlib", "dropout", "5-N(128)", "dropout!"]
-
["nnlib", "dropout", "5-N(128)", "dropout"]
-
["nnlib", "dropout", "5-N(256)", "dropout!"]
-
["nnlib", "dropout", "5-N(256)", "dropout"]
-
["nnlib", "dropout", "5-N(512)", "dropout!"]
-
["nnlib", "dropout", "5-N(512)", "dropout"]
-
["nnlib", "gemm", "Float32", "batched_gemm!"]
-
["nnlib", "gemm", "Float64", "batched_gemm!"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(256)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(2)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(2)", "meanpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "lpnormpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "maxpool1d-direct"]
-
["nnlib", "pooling", "3-N(512)-K(4)-stride(4)", "meanpool1d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(2)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(256)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(2)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(1)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(2)", "meanpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "lpnormpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "maxpool2d-direct"]
-
["nnlib", "pooling", "4-N(512)-K(4)-stride(4)", "meanpool2d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(2)-stride(1)", "meanpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "lpnormpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "maxpool3d-direct"]
-
["nnlib", "pooling", "5-N(256)-K(4)-stride(4)", "meanpool3d-direct"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float16", "fw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "bw"]
-
["nnlib", "softmax", "logsoftmax", "Float32", "fw"]
-
["nnlib", "softmax", "softmax", "Float16", "bw"]
-
["nnlib", "softmax", "softmax", "Float16", "fw"]
-
["nnlib", "softmax", "softmax", "Float32", "bw"]
-
["nnlib", "softmax", "softmax", "Float32", "fw"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(128)-scale((0.5, 2))", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(32)-scale(2)", "Float32"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float16"]
-
["nnlib", "upsample", "linear", "4-N(64)-scale(4)", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(32)-scale((1, 2, 1))", "Float32"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float16"]
-
["nnlib", "upsample", "linear", "5-N(64)-scale(8)", "Float32"]
-
["nnlib", "upsample", "nearest", "3-N(128)"]
-
["nnlib", "upsample", "nearest", "3-N(64)"]
-
["nnlib", "upsample", "nearest", "4-N(128)"]
-
["nnlib", "upsample", "nearest", "4-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(32)"]
-
["nnlib", "upsample", "nearest", "5-N(64)"]
Julia versioninfo
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 22.04.4 LTS
uname: Linux 6.5.0-1018-azure #19~22.04.2-Ubuntu SMP Thu Mar 21 16:45:46 UTC 2024 x86_64 x86_64
CPU: AMD EPYC 7763 64-Core Processor:
speed user nice sys idle irq
#1 3242 MHz 145 s 0 s 70 s 2627 s 0 s
#2 3245 MHz 313 s 0 s 89 s 2446 s 0 s
#3 3215 MHz 340 s 0 s 65 s 2423 s 0 s
#4 3114 MHz 211 s 0 s 58 s 2570 s 0 s
Memory: 15.606494903564453 GB (14153.55859375 MB free)
Uptime: 286.52 sec
Load Avg: 0.83 0.33 0.13
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Runtime information
Runtime Info | |
---|---|
BLAS #threads | 2 |
BLAS.vendor() |
lbt |
Sys.CPU_THREADS |
4 |
lscpu
output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7763 64-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 1
BogoMIPS: 4890.85
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 64 KiB (2 instances)
L1i cache: 64 KiB (2 instances)
L2 cache: 1 MiB (2 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Cpu Property | Value |
---|---|
Brand | AMD EPYC 7763 64-Core Processor |
Vendor | :AMD |
Architecture | :Unknown |
Model | Family: 0xaf, Model: 0x01, Stepping: 0x01, Type: 0x00 |
Cores | 16 physical cores, 16 logical cores (on executing CPU) |
No Hyperthreading hardware capability detected | |
Clock Frequencies | Not supported by CPU |
Data Cache | Level 1:3 : (32, 512, 32768) kbytes |
64 byte cache line size | |
Address Size | 48 bits virtual, 48 bits physical |
SIMD | 256 bit = 32 byte max. SIMD vector size |
Time Stamp Counter | TSC is accessible via rdtsc |
TSC runs at constant rate (invariant from clock frequency) | |
Perf. Monitoring | Performance Monitoring Counters (PMC) are not supported |
Hypervisor | Yes, Microsoft |
This kind of reporting is way too lengthy. @avik-pal what do you think, should we merge?
Seems like the opposite of what I expected, it seems to slow down the _im2col
. This is strange because I saw speedups in https://lux.csail.mit.edu/benchmarks/ (see the conv ones for lux against flux) where Lux effectively reduces blas threads before calling conv.
We've thought about doing this before even if it doesn't result in perf gains, just to make multithreading less unwieldy with Flux. The main challenge has always been that BLAS.set_num_threads
affects all active code in the process. This means that people who are running Flux models in parallel with other code that uses BLAS may experience spooky action at a distance.
Is there really no way to disable multithreading for just the context of a conv routine? Perhaps some lower-level call we can make which is guaranteed to run single-threaded?
Is there a specific graph we should look at? Just looking for apples-to-apples, I think Conv((3, 3), 1 => 1)
might be the closest, and I'm not seeing a difference there with the curves.
Since we haven't used this benchmark tool recently, I did just double check that it checked out and ran the right versions of the baseline and target. It seems to all be correct (you can check the raw action log here: https://github.com/FluxML/FluxMLBenchmarks.jl/actions/runs/8889259577/job/24407415613).
Try this script.
using Lux, Random
import Flux, Metalhead
using Zygote
using UnicodePlots
using LinearAlgebra
using ThreadPinning
pinthreads(:cores)
BLAS.set_num_threads(min(4, Threads.nthreads()))
@info "BLAS Threads: $(BLAS.get_num_threads())"
threadinfo()
versioninfo()
flux_model = Metalhead.VGG(19)
lux_model = FromFluxAdaptor()(flux_model.layers);
ps, st = Lux.setup(Xoshiro(), lux_model);
st_test = Lux.testmode(st);
bsizes = 2 .^ (0:8)
lux_timings = zeros(Float64, length(bsizes))
flux_timings = zeros(Float64, length(bsizes))
for (i, bsize) in enumerate(bsizes)
x_input = rand(Float32, 224, 224, 3, bsize)
lux_timings[i] = @belapsed $lux_model($x_input, $ps, $st_test)
flux_timings[i] = @belapsed $flux_model($x_input)
@info "Batch size: $bsize" Lux=lux_timings[i] Flux=flux_timings[i] ratio=(lux_timings[i] /
flux_timings[i])
end
display(lineplot(bsizes, hcat(lux_timings, flux_timings); name=["Lux" "Flux"], color=[:blue :red]))
bsizes = 2 .^ (0:8)
lux_backward_timings = zeros(Float64, length(bsizes))
flux_backward_timings = zeros(Float64, length(bsizes))
f1 = (m, x) -> sum(abs2, m(x))
f2 = (m, x, ps, st) -> sum(abs2, first(m(x, ps, st)))
for (i, bsize) in enumerate(bsizes)
x_input = rand(Float32, 224, 224, 3, bsize)
lux_backward_timings[i] = @belapsed Zygote.gradient(
f2, $lux_model, $x_input, $ps, $st)
flux_backward_timings[i] = @belapsed Zygote.gradient(
f1, $flux_model, $x_input)
@info "Batch size: $bsize" Lux=lux_backward_timings[i] Flux=flux_backward_timings[i] ratio=(lux_backward_timings[i] /
flux_backward_timings[i])
end
display(lineplot(bsizes, hcat(lux_backward_timings, flux_backward_timings);
name=["Lux" "Flux"], color=[:blue :red]))
[ Info: BLAS Threads: 4
System: 64 cores (2-way SMT), 2 sockets, 2 NUMA domains
| 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,
64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95 |
| 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,
48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,
96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 |
# = Julia thread, # = HT, # = Julia thread on HT, | = Socket seperator
Julia threads: 16
├ Occupied CPU-threads: 16
└ Mapping (Thread => CPUID): 1 => 0, 2 => 1, 3 => 2, 4 => 3, 5 => 4, ...
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 7502 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 16 default, 0 interactive, 8 GC (on 128 virtual cores)
For the Forward Pass
For the Backward Pass