webml-polyfill [WebGPU] Enable tfjs-backend-webgpu in webml-polyfill

Model inference in polyfill using tfjs-backend-webgpu.

Dec 26 '19 04:12 NALLEIN

I tried to integrate with WebGPU backend of TF.js : You have to build Tf.js and then replace the path in package.json with your own path.This branch could run in Chrome Dev on MacOS

build TF.js

git clone https://github.com/NALLEIN/tfjs.git
cd tfjs
git checkout -b opsForPolyfill origin/opsForPolyfill

yarn
cd tfjs-core
yarn & yarn build
cd ../tfjs-backend-wengpu
yarn & yarn build

replace WebGL backend with tfjs-backend-webgpu

Replace the path of 'local_tfjs' and 'local_webgpu' in package.json with TF.js path of your own

git clone https://github.com/NALLEIN/webml-polyfill.git
cd webml-polyfill
git checkoyt -b WebGPU-backend-test origin/WebGPU-backend-test

yarn & yarn start

And then you can test the examples of polyfill.

TODO

Model inference can execute without error but the result is incorrect. Use the analysis tools to locate problem.

Dec 26 '19 05:12 NALLEIN

@NALLEIN , could you please send a PR to preview your change?

Mar 31 '20 07:03 huningxin

Conv2d with relu6 will be wrong when running the image classification model. I will reproduce the error to Xu Xin and tfjs.

Failed test case in CTS
check result for Add v1_2 example-1‣
check result for Argmax example/1-4‣
check result for Argmax example/2-4‣
check result for Argmax example/3-4
check result for Avg pool float relu6 example/1
check result for Batch to space example
check result for Batch to space float example/1‣
check result for Conv2d v1_2 example-21
check result for Conv2d v1_2 example-22
check result for Conv2d v1_2 example-25
check result for Conv2d v1_2 example-26
check result for Conv2d v1_2 example-27
check result for Conv2d v1_2 example-30
check result for Conv2d v1_2 example-31
check result for Conv2d v1_2 example-32
check result for Conv2d v1_2 example-36
check result for Conv2d v1_2 example-37
check result for Conv2d v1_2 example-38
check result for Conv2d v1_2 example-42
check result for Conv 1 h3 w2 same relu6 example-1
check result for Conv 1 h3 w2 same relu6 example-2
check result for Conv 1 h3 w2 valid relu6 example-1
check result for Conv 1 h3 w2 valid relu6 example-2
check result for Conv 3 h3 w2 same relu6 example-1
check result for Conv 3 h3 w2 same relu6 example-2
check result for Conv 3 h3 w2 valid relu6 example-1
check result for Conv 3 h3 w2 valid relu6 example-2
check result for Conv float channels example
check result for Conv float channels relaxed example
check result for Conv float channels relu example
check result for Conv float channels relu6 example
check result for Conv float channels weights as inputs example
check result for Conv float channels weights as inputs relaxed example
check result for Conv float channels weights as inputs relu example
check result for Conv float channels weights as inputs relu6 example
check result for Conv float large example
check result for Conv float large relaxed example
check result for Conv float large relu example
check result for Conv float large relu6 example
check result for Conv float large weights as inputs example
check result for Conv float large weights as inputs relaxed example
check result for Conv float large weights as inputs relu example
check result for Conv float large weights as inputs relu6 example
check result for Depthwise conv2d float large example/2
check result for Depthwise conv2d float large 2 relaxed example
check result for Depthwise conv2d float large relu example/2
check result for Depthwise conv2d float large relu1 example/2
check result for Depthwise conv2d float large relu6 example/2
check result for Depthwise conv2d float large 2 weights as inputs example
check result for Depthwise conv2d float large 2 weights as inputs relaxed example
check result for Depthwise conv2d float large 2 weights as inputs relu example
check result for Depthwise conv2d float large 2 weights as inputs relu1 example
check result for Depthwise conv2d float large 2 weights as inputs relu6 example
check result for Depthwise conv2d float large relu6 example
check result for Depthwise conv2d float large weights as inputs relu6 example
check result for Depthwise conv2d float relu6 example
check result for Depthwise conv2d float weights as inputs relu6 example
check result for Depthwise conv2d v1_2 example-33
check result for Depthwise conv2d v1_2 example-34
check result for Depthwise conv2d v1_2 example-35
check result for Depthwise conv2d v1_2 example-38
check result for Depthwise conv2d v1_2 example-39
check result for Depthwise conv2d v1_2 example-40
check result for Depthwise conv relu6 example-1
check result for Depthwise conv relu6 example-2
check result for Fully connected float relu6 example
check result for Max pool float relu6 example/1
check result for Mul relu6 example
check result for Transpose example
check result for Transpose float16 example
check result for Transpose float example/1
check result for Transpose relaxed example
check result for Transpose v1_2 example-1
check result for Transpose v1_2 example-2

Failed test case in CTS Supplement Test
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-3
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-4
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-3
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-4
check result for ATROUS_DEPTHWISE_CONV_2D valid example-2

Apr 14 '20 06:04 NALLEIN

When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.

With below two PRs applied, certain cases may works: For conv2d (non-fused): try this https://github.com/tensorflow/tfjs/pull/2993 For fused-conv2d: try https://github.com/tensorflow/tfjs/pull/2846 and https://github.com/tensorflow/tfjs/pull/2993

However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): https://github.com/axinging/tfjs/commit/78b5eaa0e592d90e21cf155b894916229a9ea409#diff-dcb528c192f70859b8f4333e400b445fL777

Please try above first, if still reports error, please let me know.

Apr 14 '20 08:04 axinging

When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.

With below two PRs applied, certain cases may works: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): axinging/tfjs@78b5eaa#diff-dcb528c192f70859b8f4333e400b445fL777

Please try above first, if still reports error, please let me know.

After modifying the getAndSavePipelineh function, and fix dilation according to tensorflow/tfjs#2846 and tensorflow/tfjs#2993. Partial image classfication model can be run correctly in webml-polyfill. The previous getAndSavePipelien function makes many operations can not get the correct results. The inference time listed are the results of running only once. I used Conv2DMMProgram. It's very slow because of using the asynchronous function data () to read the data in Tensor tfjs-backend-webgpu currently does not support synchronous dataSync () to read data in Tensor. After all models in image-classification can run correctly, I will continue to measure detailed performance data.

TFlite Model	Inference Time Before (ms)	Inference Time Now (ms)	Predict Result
MobileNet V1	59.34	52.15	Right
MobileNet V2	62.60	57.11	Right
SqueeseNet	48.93	66.90	Right
Inception V3	236.21	252.04	Wrong
Inception V4	425.38	426.93	Wrong
Inception ResNet V2	359.25	424.85	Right

ONNX Model	Inference Time Before (ms)	Inference Time Now (ms)	Predict Result
SqueeseNet	41.08	35.52	Right
MobileNet V2	87.77	58.98	Right
ResNet50 V1	155.88	162.48	Wrong
ResNet50 V2	214.68	Crash	Wrong
Inception V2	115.53	156.63	Wrong
DenseNet 121	309.74	318.90	Wrong

OpenVINO Model	Inference Time Before (ms)	Inference Time Now (ms)	Predict Result
SqueezeNet	42.10	36.03	Right
MobileNet V1	52.10	38.21	Right
MobileNet V2	58.35	55.91	Right
ResNet50 V1	123.10	139.87	Wrong
DenseNet 121	210.56	185.58	Wrong
Inception V2	108.36	119.97	Wrong
Inception V4	441.36	499.03	Right

Apr 15 '20 07:04 NALLEIN

It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.

2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342

Apr 15 '20 08:04 axinging

It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.

I used Conv2DMMProgram and it seems that no error appears in the fused conv2d operation when inferencing thees correct models.

2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342

I disabled the shaderkey because it made some operations can not get the right results. When inferencing the model, the previous operation runs correctly, the same operation later runs incorrectly, and even many NaN values appear in the result if we enable the shder key. Perhaps the method of generating shaderkey needs further modification.

Taking mobilenet_v1 as an example, you can see that the attributes of Conv2d_3_depthwise and Conv2d_4_depthwise are exactly the same. I guess that at this time, the shader key will be the same, so that it reuses the previous pipeline and due to the different shape of the input Tensor may got the the wrong result.

Apr 15 '20 08:04 NALLEIN

After using “Conv2DNaiveProgram” to perform fusedConv operation, most of the models can be run correctly in webml-polyfill. Fusedconv2d relu bias and prelu still works incorrectly. There are some problems :

The running speed of the model is very slow compared to WebGL-backend.
Some models will make the browser crash during the model compilation stage.
Some operation results in the super-resolution example are still wrong.

Example	Model	Problem
image_classification	resnet50v2.onnx	crash
object_detection	ssd_mobilenet_v2.tflite tiny_yolov2_coco.tflite tiny_yolov2_voc.tflite	crash
face_recogination	tiny_yolov2_face.tflite	crash
facial_landmark_detection	face_landmark.tflite	crash
super_resolution	srgan_96_4.tflite srgan_128_4.tflite	wrong result
emotion_analysis	tiny_yolov2_face.tflite	crash
emotion_analysis	emotion_classification_7.tflite	wrong result
speech_commands	kws_cnn.tflite	wrong result

I will find out what caused the browser to crash during the model compilation stage and modify WebGPUModel.ts to avoid using .data () to read data in tensor.

Apr 18 '20 16:04 NALLEIN

For the speed, after merge https://github.com/tensorflow/tfjs/issues/3095 and https://github.com/tensorflow/tfjs/pull/3049, I found that the handpose model runs faster.

The fusedConv2D may worked correctly with https://github.com/tensorflow/tfjs/issues/3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.

Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

Apr 19 '20 01:04 axinging

For the speed, after merge tensorflow/tfjs#3095 and tensorflow/tfjs#3049, I found that the handpose model runs faster.

The fusedConv2D may worked correctly with tensorflow/tfjs#3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.

Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

I think the fusedConv2D still exists problems with tensorflow/tfjs#3095, I tried it before and some models still got wrong results. But with tensorflow/tfjs#3049 the inference time reduced.

Apr 19 '20 08:04 NALLEIN

Sorry, a typo above, not 3095, 3095 is a issue, the fix of this issue is tensorflow/tfjs#3096

Apr 19 '20 12:04 axinging

@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.

May 28 '20 08:05 ibelem

I test the inference time of image-classification models with workload for 200 iterations and the result is as follows:

TFlite Model	Inference Time of WebGL (ms)	Inference Time of WebGPU (ms)
MobileNet V1	39.69+-4.38	42.27+-7.39
MobileNet V2	36.89+-5.84	55.87+-7.63
SqueeseNet	41.91+-5.65	70.96+-7.70
Inception V3	197.95+-14.26	267.02+-16.86
Inception V4	365.70+-18.43	510.10+-32.97
Inception ResNet V2	317.68+-14.17	455.02+-20.82

TFlite Model	Inference Time of WebGL (ms)	Inference Time of WebGPU (ms)
SqueeseNet	29.22+-5.59	37.91+-6.05
MobileNet V2	42.89+-4.63	58.84+-15.15
ResNet50 V1	127.51+-12.76	159.31+-8.29
ResNet50 V2	190.72+-9.23	Browser Crash
Inception V2	77.06+-6.17	144.94+-15.87
DenseNet 121	233.46+-18.46	318.80+-39.41

TFlite Model	Inference Time of WebGL (ms)	Inference Time of WebGPU (ms)
SqueezeNet	30.36+-4.51	38.83+-6.10
MobileNet V1	35.55+-6.71	40.90+-6.66
MobileNet V2	36.65+-8.06	56.79+-12.05
ResNet50 V1	118.39+-9.97	157.48+-26.32
DenseNet 121	132.17+-6.48	187.41+-18.95
Inception V2	74.27+-5.69	116.66+-20.94
Inception V4	371.29+-18.67	537.13+-61.69

Jun 02 '20 02:06 NALLEIN

@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.

The current version of tfjs-backend-webgpu is 0.0.1-alpha.0

Jun 02 '20 02:06 NALLEIN

@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.

Jun 03 '20 01:06 huningxin

@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.

Thanks, I'll do that.

Jun 03 '20 05:06 NALLEIN

webml-polyfill webml-polyfill copied to clipboard

[WebGPU] Enable tfjs-backend-webgpu in webml-polyfill

build TF.js

replace WebGL backend with tfjs-backend-webgpu

TODO

webml-polyfill
webml-polyfill copied to clipboard