XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

Build XNNPACK Web Assembly library with cmake

Open arjunsurendran24 opened this issue 3 years ago • 5 comments

This PR adds support for building WASM version of XNNPACK with emcmake

  • New Build script build-wasm.sh to build XNNPACK with emscripten and emcmake
  • Options - enable_tests, enable_benchmarks, enable_simd and enable_threads to customize builds
  • The tests and benchmarks can be run with any supported browser or using emrun with following command
emrun --browser <browser> <test.html>/<benchmark.html>

Example test result

[==========] Running 131 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 92 tests from CONVOLUTION_NCHW_F32
[ RUN      ] CONVOLUTION_NCHW_F32.1x1
[       OK ] CONVOLUTION_NCHW_F32.1x1 (60 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.1x1_zero_weights (113 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_height (232 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_width (166 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_varying_input_channels
[       OK ] CONVOLUTION_NCHW_F32.1x1_varying_input_channels (63 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_varying_output_channels
[       OK ] CONVOLUTION_NCHW_F32.1x1_varying_output_channels (30 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.1x1_with_qmin (77 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.1x1_with_qmax (79 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.1x1_without_bias
[       OK ] CONVOLUTION_NCHW_F32.1x1_without_bias (60 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1 (227 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_zero_weights (193 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_height (335 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_width (302 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_input_channels (94 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_varying_output_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_varying_output_channels (44 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_with_input_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_input_stride (97 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_with_output_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_output_stride (89 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmin (147 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_with_qmax (200 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_1x1_without_bias
[       OK ] CONVOLUTION_NCHW_F32.batched_1x1_without_bias (126 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3 (106 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_zero_weights (56 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_height (227 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_input_width (258 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_varying_channels (34 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmin (77 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_with_qmax (114 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3_without_bias
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3_without_bias (71 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3 (138 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_zero_weights (220 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_height (445 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_input_width (375 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_varying_channels (34 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_input_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_input_stride (125 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_output_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_output_stride (109 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmin (83 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_with_qmax (62 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_without_bias
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3_without_bias (117 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2 (9 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_zero_weights (7 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_height (21 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_input_width (27 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_varying_channels (2 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmin (6 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_with_qmax (9 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_without_bias
[       OK ] CONVOLUTION_NCHW_F32.depthwise_3x3s2_without_bias (7 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2 (15 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_zero_weights (13 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_height (34 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_input_width (35 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_varying_channels (21 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_input_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_input_stride (13 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_output_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_output_stride (14 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmin (76 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_with_qmax (93 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_without_bias
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_3x3s2_without_bias (21 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5 (74 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_zero_weights (118 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_height (192 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_input_width (195 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_varying_channels (23 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmin (56 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_with_qmax (69 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5_without_bias
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5_without_bias (73 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5 (138 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_zero_weights (128 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_height (319 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_input_width (709 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_varying_channels (122 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_input_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_input_stride (313 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_output_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_output_stride (316 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmin (257 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_with_qmax (230 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_without_bias
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5_without_bias (297 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2 (41 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_zero_weights (30 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_height (153 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_input_width (151 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_varying_channels (5 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmin (44 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_with_qmax (35 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_without_bias
[       OK ] CONVOLUTION_NCHW_F32.depthwise_5x5s2_without_bias (43 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2 (158 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_zero_weights
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_zero_weights (127 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_height
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_height (238 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_width
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_input_width (298 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_channels
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_varying_channels (60 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_input_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_input_stride (136 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_output_stride
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_output_stride (135 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmin
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmin (101 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmax
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_with_qmax (85 ms)
[ RUN      ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_without_bias
[       OK ] CONVOLUTION_NCHW_F32.batched_depthwise_5x5s2_without_bias (90 ms)
[----------] 92 tests from CONVOLUTION_NCHW_F32 (14084 ms total)

[----------] 15 tests from CONVOLUTION_NHWC2NCHW_OP_F32
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2 (23 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_height
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_height (33 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_width
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_input_width (42 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_output_channels
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_varying_output_channels (4 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmin
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmin (12 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmax
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_with_qmax (19 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_without_bias
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.3x3c3s2_without_bias (22 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2 (27 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_height
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_height (64 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_width
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_input_width (164 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_output_channels
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_varying_output_channels (23 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_output_stride
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_output_stride (30 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmin
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmin (15 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmax
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_with_qmax (25 ms)
[ RUN      ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_without_bias
[       OK ] CONVOLUTION_NHWC2NCHW_OP_F32.batched_3x3c3s2_without_bias (42 ms)
[----------] 15 tests from CONVOLUTION_NHWC2NCHW_OP_F32 (1509 ms total)

[----------] 24 tests from DEPTHWISE_CONVOLUTION_NCHW_F32
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3 (25 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_varying_channels (12 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3_without_bias (18 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3 (37 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_varying_channels (45 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3_without_bias (60 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2 (6 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_varying_channels (9 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.3x3s2_without_bias (5 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2 (11 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_varying_channels (9 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_3x3s2_without_bias (25 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5 (77 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_varying_channels (19 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5_without_bias (80 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5 (165 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_varying_channels (62 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5_without_bias (112 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2 (11 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_varying_channels (4 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.5x5s2_without_bias (18 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2 (131 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_varying_channels
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_varying_channels (21 ms)
[ RUN      ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_without_bias
[       OK ] DEPTHWISE_CONVOLUTION_NCHW_F32.batched_5x5s2_without_bias (77 ms)
[----------] 24 tests from DEPTHWISE_CONVOLUTION_NCHW_F32 (2633 ms total)

[----------] Global test environment tear-down
[==========] 131 tests from 3 test suites ran. (18559 ms total)
[  PASSED  ] 131 tests.

Example benchmark result

---------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                          Time           CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------
[0;32mmax_pooling_f32/shufflenet/N:1/H:112/W:112/K:3/P:1/S:2/C:24/real_time     [m[0;33m    189546 ns     189549 ns [m[0;36m      3605[m bytes=7.94148G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:111/W:111/K:3/P:0/S:2/C:96/real_time [m[0;33m    698436 ns     698441 ns [m[0;36m       927[m bytes=8.43723G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:27/W:27/K:3/P:0/S:2/C:256/real_time  [m[0;33m    104990 ns     104990 ns [m[0;36m      6526[m bytes=8.75847G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v10/N:1/H:13/W:13/K:3/P:0/S:2/C:512/real_time  [m[0;33m     47794 ns      47795 ns [m[0;36m     13391[m bytes=8.78436G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:111/W:111/K:3/P:0/S:2/C:64/real_time [m[0;33m    475767 ns     475774 ns [m[0;36m      1343[m bytes=8.25735G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:55/W:55/K:3/P:0/S:2/C:128/real_time  [m[0;33m    222067 ns     222068 ns [m[0;36m      2947[m bytes=8.65528G/s[m
[m[0;32mmax_pooling_f32/squeezenet_v11/N:1/H:13/W:13/K:3/P:0/S:2/C:256/real_time  [m[0;33m     22356 ns      22356 ns [m[0;36m     30717[m bytes=9.39G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:224/W:224/K:2/P:1/S:2/C:64                      [m[0;33m   2352121 ns    2352138 ns [m[0;36m       297[m bytes=6.85075G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:112/W:112/K:2/P:1/S:2/C:128                     [m[0;33m   1082512 ns    1082521 ns [m[0;36m       609[m bytes=7.46962G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:56/W:56/K:2/P:1/S:2/C:256                       [m[0;33m    508991 ns     508991 ns [m[0;36m      1209[m bytes=8.00102G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:28/W:28/K:2/P:1/S:2/C:512                       [m[0;33m    266183 ns     266183 ns [m[0;36m      2667[m bytes=7.7632G/s[m
[m[0;32mmax_pooling_f32/vgg/N:1/H:14/W:14/K:2/P:1/S:2/C:512                       [m[0;33m     77782 ns      77783 ns [m[0;36m      9109[m bytes=6.84572G/s[m
[m[0;32mmax_pooling_u8/shufflenet/N:1/H:112/W:112/K:3/P:1/S:2/C:24/real_time      [m[0;33m   2280049 ns    2280081 ns [m[0;36m       307[m bytes=165.049M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:111/W:111/K:3/P:0/S:2/C:96/real_time  [m[0;33m   8715625 ns    8715750 ns [m[0;36m        80[m bytes=169.032M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:27/W:27/K:3/P:0/S:2/C:256/real_time   [m[0;33m   1303189 ns    1303198 ns [m[0;36m       541[m bytes=176.404M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v10/N:1/H:13/W:13/K:3/P:0/S:2/C:512/real_time   [m[0;33m    552055 ns     552055 ns [m[0;36m      1265[m bytes=190.126M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:111/W:111/K:3/P:0/S:2/C:64/real_time  [m[0;33m   5866708 ns    5866708 ns [m[0;36m       120[m bytes=167.41M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:55/W:55/K:3/P:0/S:2/C:128/real_time   [m[0;33m   2808645 ns    2808665 ns [m[0;36m       251[m bytes=171.083M/s[m
[m[0;32mmax_pooling_u8/squeezenet_v11/N:1/H:13/W:13/K:3/P:0/S:2/C:256/real_time   [m[0;33m    275707 ns     275708 ns [m[0;36m      2569[m bytes=190.347M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:224/W:224/K:2/P:1/S:2/C:64                       [m[0;33m  13288774 ns   13288679 ns [m[0;36m        53[m bytes=303.151M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:112/W:112/K:2/P:1/S:2/C:128                      [m[0;33m   6531215 ns    6531215 ns [m[0;36m       107[m bytes=309.514M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:56/W:56/K:2/P:1/S:2/C:256                        [m[0;33m   3332948 ns    3332948 ns [m[0;36m       212[m bytes=305.469M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:28/W:28/K:2/P:1/S:2/C:512                        [m[0;33m   1697352 ns    1697352 ns [m[0;36m       423[m bytes=304.361M/s[m
[m[0;32mmax_pooling_u8/vgg/N:1/H:14/W:14/K:2/P:1/S:2/C:512                        [m[0;33m    427991 ns     427997 ns [m[0;36m      1645[m bytes=311.03M/s[m

arjunsurendran24 avatar Mar 17 '21 04:03 arjunsurendran24

Thanks for your contribution @arjunsurendran24! The PR includes parts which modify CMAKE_* variables, or adds -msimd128/-pthread options which make the whole WAsm binary (not just XNNPACK parts) require WebAssembly SIMD / WebAssembly Threads. I think XNNPACK is not the right place to modify CMAKE_* variables or add such globally affecting flags. Please instead move these parts into Emscripten.cmake in Emscripten SDK.

Maratyszcza avatar Mar 17 '21 06:03 Maratyszcza

Thanks for your contribution @arjunsurendran24! The PR includes parts which modify CMAKE_* variables, or adds -msimd128/-pthread options which make the whole WAsm binary (not just XNNPACK parts) require WebAssembly SIMD / WebAssembly Threads. I think XNNPACK is not the right place to modify CMAKE_* variables or add such globally affecting flags. Please instead move these parts into Emscripten.cmake in Emscripten SDK.

Hi @Maratyszcza Thanks for the comments. I have addressed the comments and scoped the flags to the required source files. If that's not okay I can remove SIMD / Thread options altogether till it lands in emscripten.

arjunsurendran24 avatar Mar 22 '21 18:03 arjunsurendran24

Is there any plans to continue this PR?

apilguk avatar Mar 01 '22 21:03 apilguk

It would be very helpful in compiling tensorflow with cmake too.

davlhd avatar Aug 17 '22 15:08 davlhd

/Users/dexter/project/XNNPACK/build/wasm/googlebenchmark-source/src/complexity.cc:79:10: error: variable 'sigma_gn' set but not used [-Werror,-Wunused-but-set-variable] 79 | double sigma_gn = 0.0; | ^ 1 error generated.

langhuihui avatar Oct 13 '23 08:10 langhuihui