XNNPACK
XNNPACK copied to clipboard
Build XNNPACK Web Assembly library with cmake
This PR adds support for building WASM version of XNNPACK with emcmake
- New Build script
build-wasm.sh
to build XNNPACK with emscripten and emcmake - Options -
enable_tests
,enable_benchmarks
,enable_simd
andenable_threads
to customize builds - The tests and benchmarks can be run with any supported browser or using
emrun
with following command
emrun --browser <browser> <test.html>/<benchmark.html>
Example test result
[==========] Running 131 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 92 tests from CONVOLUTION_NCHW_F32
[ RUN ] CONVOLUTION_NCHW_F32.1x1
[ OK ] CONVOLUTION_NCHW_F32.1x1 (60 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.1x1_zero_weights (113 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_height (232 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_width (166 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_varying_input_channels
[ OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_channels (63 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_varying_output_channels
[ OK ] CONVOLUTION_NCHW_F32.1x1_varying_output_channels (30 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.1x1_with_qmin (77 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.1x1_with_qmax (79 ms)
[ RUN ] CONVOLUTION_NCHW_F32.1x1_without_bias
[ OK ] CONVOLUTION_NCHW_F32.1x1_without_bias (60 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1 (227 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_zero_weights (193 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_height (335 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_width (302 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_channels (94 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_varying_output_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_output_channels (44 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_with_input_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_input_stride (97 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_with_output_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_output_stride (89 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmin (147 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmax (200 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_1x1_without_bias
[ OK ] CONVOLUTION_NCHW_F32.batched_1x1_without_bias (126 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3 (106 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_zero_weights (56 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_height (227 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_width (258 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_channels (34 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmin (77 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmax (114 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3_without_bias
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_without_bias (71 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3 (138 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_zero_weights (220 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_height (445 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_width (375 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_channels (34 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_input_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_input_stride (125 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_output_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_output_stride (109 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmin (83 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmax (62 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_without_bias
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_without_bias (117 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2 (9 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_zero_weights (7 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_height (21 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_width (27 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_channels (2 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmin (6 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmax (9 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_without_bias
[ OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_without_bias (7 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2 (15 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_zero_weights (13 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_height (34 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_width (35 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_channels (21 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_input_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_input_stride (13 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_output_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_output_stride (14 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmin (76 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmax (93 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_without_bias
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_without_bias (21 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5 (74 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_zero_weights (118 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_height (192 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_width (195 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_channels (23 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmin (56 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmax (69 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5_without_bias
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_without_bias (73 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5 (138 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_zero_weights (128 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_height (319 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_width (709 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_channels (122 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_input_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_input_stride (313 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_output_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_output_stride (316 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmin (257 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmax (230 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_without_bias
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_without_bias (297 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2 (41 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_zero_weights (30 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_height (153 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_width (151 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_channels (5 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmin (44 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmax (35 ms)
[ RUN ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_without_bias
[ OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_without_bias (43 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2 (158 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_zero_weights
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_zero_weights (127 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_height
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_height (238 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_width
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_width (298 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_channels
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_channels (60 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_input_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_input_stride (136 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_output_stride
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_output_stride (135 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmin
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmin (101 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmax
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmax (85 ms)
[ RUN ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_without_bias
[ OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_without_bias (90 ms)
[----------] 92 tests from CONVOLUTION_NCHW_F32 (14084 ms total)
[----------] 15 tests from CONVOLUTION_NHWC2NCHW_OP_F32
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2 (23 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_height
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_height (33 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_width
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_width (42 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_output_channels
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_output_channels (4 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmin
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmin (12 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmax
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmax (19 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_without_bias
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_without_bias (22 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2 (27 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_height
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_height (64 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_width
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_width (164 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_output_channels
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_output_channels (23 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_output_stride
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_output_stride (30 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmin
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmin (15 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmax
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmax (25 ms)
[ RUN ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_without_bias
[ OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_without_bias (42 ms)
[----------] 15 tests from CONVOLUTION_NHWC2NCHW_OP_F32 (1509 ms total)
[----------] 24 tests from DEPTHWISE_CONVOLUTION_NCHW_F32
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3 (25 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_varying_channels (12 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_without_bias (18 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3 (37 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_varying_channels (45 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_without_bias (60 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2 (6 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_varying_channels (9 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_without_bias (5 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2 (11 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_varying_channels (9 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_without_bias (25 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5 (77 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_varying_channels (19 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_without_bias (80 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5 (165 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_varying_channels (62 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_without_bias (112 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2 (11 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_varying_channels (4 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_without_bias (18 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2 (131 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_varying_channels
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_varying_channels (21 ms)
[ RUN ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_without_bias
[ OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_without_bias (77 ms)
[----------] 24 tests from DEPTHWISE_CONVOLUTION_NCHW_F32 (2633 ms total)
[----------] Global test environment tear-down
[==========] 131 tests from 3 test suites ran. (18559 ms total)
[ PASSED ] 131 tests.
Example benchmark result
---------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------
[0;32mmax_pooling_f32/shufflenet/N:1/H:112/W:112/K:3/P:1/S:2/C:24/real_time [m[0;33m 189546 ns 189549 ns [m[0;36m 3605[m bytes=7.94148G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:111/W:111/K:3/P:0/S:2/C:96/real_time [m[0;33m 698436 ns 698441 ns [m[0;36m 927[m bytes=8.43723G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:27/W:27/K:3/P:0/S:2/C:256/real_time [m[0;33m 104990 ns 104990 ns [m[0;36m 6526[m bytes=8.75847G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:13/W:13/K:3/P:0/S:2/C:512/real_time [m[0;33m 47794 ns 47795 ns [m[0;36m 13391[m bytes=8.78436G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:111/W:111/K:3/P:0/S:2/C:64/real_time [m[0;33m 475767 ns 475774 ns [m[0;36m 1343[m bytes=8.25735G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:55/W:55/K:3/P:0/S:2/C:128/real_time [m[0;33m 222067 ns 222068 ns [m[0;36m 2947[m bytes=8.65528G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:13/W:13/K:3/P:0/S:2/C:256/real_time [m[0;33m 22356 ns 22356 ns [m[0;36m 30717[m bytes=9.39G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:224/W:224/K:2/P:1/S:2/C:64 [m[0;33m 2352121 ns 2352138 ns [m[0;36m 297[m bytes=6.85075G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:112/W:112/K:2/P:1/S:2/C:128 [m[0;33m 1082512 ns 1082521 ns [m[0;36m 609[m bytes=7.46962G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:56/W:56/K:2/P:1/S:2/C:256 [m[0;33m 508991 ns 508991 ns [m[0;36m 1209[m bytes=8.00102G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:28/W:28/K:2/P:1/S:2/C:512 [m[0;33m 266183 ns 266183 ns [m[0;36m 2667[m bytes=7.7632G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:14/W:14/K:2/P:1/S:2/C:512 [m[0;33m 77782 ns 77783 ns [m[0;36m 9109[m bytes=6.84572G/s[m
[m[0;32mmax_pooling_u8/shufflenet/N:1/H:112/W:112/K:3/P:1/S:2/C:24/real_time [m[0;33m 2280049 ns 2280081 ns [m[0;36m 307[m bytes=165.049M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:111/W:111/K:3/P:0/S:2/C:96/real_time [m[0;33m 8715625 ns 8715750 ns [m[0;36m 80[m bytes=169.032M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:27/W:27/K:3/P:0/S:2/C:256/real_time [m[0;33m 1303189 ns 1303198 ns [m[0;36m 541[m bytes=176.404M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:13/W:13/K:3/P:0/S:2/C:512/real_time [m[0;33m 552055 ns 552055 ns [m[0;36m 1265[m bytes=190.126M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:111/W:111/K:3/P:0/S:2/C:64/real_time [m[0;33m 5866708 ns 5866708 ns [m[0;36m 120[m bytes=167.41M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:55/W:55/K:3/P:0/S:2/C:128/real_time [m[0;33m 2808645 ns 2808665 ns [m[0;36m 251[m bytes=171.083M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:13/W:13/K:3/P:0/S:2/C:256/real_time [m[0;33m 275707 ns 275708 ns [m[0;36m 2569[m bytes=190.347M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:224/W:224/K:2/P:1/S:2/C:64 [m[0;33m 13288774 ns 13288679 ns [m[0;36m 53[m bytes=303.151M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:112/W:112/K:2/P:1/S:2/C:128 [m[0;33m 6531215 ns 6531215 ns [m[0;36m 107[m bytes=309.514M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:56/W:56/K:2/P:1/S:2/C:256 [m[0;33m 3332948 ns 3332948 ns [m[0;36m 212[m bytes=305.469M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:28/W:28/K:2/P:1/S:2/C:512 [m[0;33m 1697352 ns 1697352 ns [m[0;36m 423[m bytes=304.361M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:14/W:14/K:2/P:1/S:2/C:512 [m[0;33m 427991 ns 427997 ns [m[0;36m 1645[m bytes=311.03M/s[m
Thanks for your contribution @arjunsurendran24! The PR includes parts which modify CMAKE_*
variables, or adds -msimd128
/-pthread
options which make the whole WAsm binary (not just XNNPACK parts) require WebAssembly SIMD / WebAssembly Threads. I think XNNPACK is not the right place to modify CMAKE_*
variables or add such globally affecting flags. Please instead move these parts into Emscripten.cmake in Emscripten SDK.
Thanks for your contribution @arjunsurendran24! The PR includes parts which modify
CMAKE_*
variables, or adds-msimd128
/-pthread
options which make the whole WAsm binary (not just XNNPACK parts) require WebAssembly SIMD / WebAssembly Threads. I think XNNPACK is not the right place to modifyCMAKE_*
variables or add such globally affecting flags. Please instead move these parts into Emscripten.cmake in Emscripten SDK.
Hi @Maratyszcza Thanks for the comments. I have addressed the comments and scoped the flags to the required source files. If that's not okay I can remove SIMD / Thread
options altogether till it lands in emscripten
.
Is there any plans to continue this PR?
It would be very helpful in compiling tensorflow with cmake too.
/Users/dexter/project/XNNPACK/build/wasm/googlebenchmark-source/src/complexity.cc:79:10: error: variable 'sigma_gn' set but not used [-Werror,-Wunused-but-set-variable] 79 | double sigma_gn = 0.0; | ^ 1 error generated.