webml-polyfill icon indicating copy to clipboard operation
webml-polyfill copied to clipboard

[WebGPU] Enable tfjs-backend-webgpu in webml-polyfill

Open NALLEIN opened this issue 5 years ago • 16 comments

Model inference in polyfill using tfjs-backend-webgpu.

NALLEIN avatar Dec 26 '19 04:12 NALLEIN

I tried to integrate with WebGPU backend of TF.js : You have to build Tf.js and then replace the path in package.json with your own path.This branch could run in Chrome Dev on MacOS

build TF.js

git clone https://github.com/NALLEIN/tfjs.git
cd tfjs
git checkout -b opsForPolyfill origin/opsForPolyfill

yarn
cd tfjs-core
yarn & yarn build
cd ../tfjs-backend-wengpu
yarn & yarn build

replace WebGL backend with tfjs-backend-webgpu

Replace the path of 'local_tfjs' and 'local_webgpu' in package.json with TF.js path of your own

git clone https://github.com/NALLEIN/webml-polyfill.git
cd webml-polyfill
git checkoyt -b WebGPU-backend-test origin/WebGPU-backend-test

yarn & yarn start

And then you can test the examples of polyfill.

TODO

Model inference can execute without error but the result is incorrect. Use the analysis tools to locate problem.

NALLEIN avatar Dec 26 '19 05:12 NALLEIN

@NALLEIN , could you please send a PR to preview your change?

huningxin avatar Mar 31 '20 07:03 huningxin

Conv2d with relu6 will be wrong when running the image classification model. I will reproduce the error to Xu Xin and tfjs.

Failed test case in CTS
check result for Add v1_2 example-1‣
check result for Argmax example/1-4‣
check result for Argmax example/2-4‣
check result for Argmax example/3-4
check result for Avg pool float relu6 example/1
check result for Batch to space example
check result for Batch to space float example/1‣
check result for Conv2d v1_2 example-21
check result for Conv2d v1_2 example-22
check result for Conv2d v1_2 example-25
check result for Conv2d v1_2 example-26
check result for Conv2d v1_2 example-27
check result for Conv2d v1_2 example-30
check result for Conv2d v1_2 example-31
check result for Conv2d v1_2 example-32
check result for Conv2d v1_2 example-36
check result for Conv2d v1_2 example-37
check result for Conv2d v1_2 example-38
check result for Conv2d v1_2 example-42
check result for Conv 1 h3 w2 same relu6 example-1
check result for Conv 1 h3 w2 same relu6 example-2
check result for Conv 1 h3 w2 valid relu6 example-1
check result for Conv 1 h3 w2 valid relu6 example-2
check result for Conv 3 h3 w2 same relu6 example-1
check result for Conv 3 h3 w2 same relu6 example-2
check result for Conv 3 h3 w2 valid relu6 example-1
check result for Conv 3 h3 w2 valid relu6 example-2
check result for Conv float channels example
check result for Conv float channels relaxed example
check result for Conv float channels relu example
check result for Conv float channels relu6 example
check result for Conv float channels weights as inputs example
check result for Conv float channels weights as inputs relaxed example
check result for Conv float channels weights as inputs relu example
check result for Conv float channels weights as inputs relu6 example
check result for Conv float large example
check result for Conv float large relaxed example
check result for Conv float large relu example
check result for Conv float large relu6 example
check result for Conv float large weights as inputs example
check result for Conv float large weights as inputs relaxed example
check result for Conv float large weights as inputs relu example
check result for Conv float large weights as inputs relu6 example
check result for Depthwise conv2d float large example/2
check result for Depthwise conv2d float large 2 relaxed example
check result for Depthwise conv2d float large relu example/2
check result for Depthwise conv2d float large relu1 example/2
check result for Depthwise conv2d float large relu6 example/2
check result for Depthwise conv2d float large 2 weights as inputs example
check result for Depthwise conv2d float large 2 weights as inputs relaxed example
check result for Depthwise conv2d float large 2 weights as inputs relu example
check result for Depthwise conv2d float large 2 weights as inputs relu1 example
check result for Depthwise conv2d float large 2 weights as inputs relu6 example
check result for Depthwise conv2d float large relu6 example
check result for Depthwise conv2d float large weights as inputs relu6 example
check result for Depthwise conv2d float relu6 example
check result for Depthwise conv2d float weights as inputs relu6 example
check result for Depthwise conv2d v1_2 example-33
check result for Depthwise conv2d v1_2 example-34
check result for Depthwise conv2d v1_2 example-35
check result for Depthwise conv2d v1_2 example-38
check result for Depthwise conv2d v1_2 example-39
check result for Depthwise conv2d v1_2 example-40
check result for Depthwise conv relu6 example-1
check result for Depthwise conv relu6 example-2
check result for Fully connected float relu6 example
check result for Max pool float relu6 example/1
check result for Mul relu6 example
check result for Transpose example
check result for Transpose float16 example
check result for Transpose float example/1
check result for Transpose relaxed example
check result for Transpose v1_2 example-1
check result for Transpose v1_2 example-2
Failed test case in CTS Supplement Test
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-3
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-4
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-3
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-4
check result for ATROUS_DEPTHWISE_CONV_2D valid example-2

NALLEIN avatar Apr 14 '20 06:04 NALLEIN

When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.

With below two PRs applied, certain cases may works: For conv2d (non-fused): try this https://github.com/tensorflow/tfjs/pull/2993 For fused-conv2d: try https://github.com/tensorflow/tfjs/pull/2846 and https://github.com/tensorflow/tfjs/pull/2993

However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): https://github.com/axinging/tfjs/commit/78b5eaa0e592d90e21cf155b894916229a9ea409#diff-dcb528c192f70859b8f4333e400b445fL777

Please try above first, if still reports error, please let me know.

axinging avatar Apr 14 '20 08:04 axinging

When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.

With below two PRs applied, certain cases may works: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): axinging/tfjs@78b5eaa#diff-dcb528c192f70859b8f4333e400b445fL777

Please try above first, if still reports error, please let me know.

After modifying the getAndSavePipelineh function, and fix dilation according to tensorflow/tfjs#2846 and tensorflow/tfjs#2993. Partial image classfication model can be run correctly in webml-polyfill. The previous getAndSavePipelien function makes many operations can not get the correct results. The inference time listed are the results of running only once. I used Conv2DMMProgram. It's very slow because of using the asynchronous function data () to read the data in Tensor tfjs-backend-webgpu currently does not support synchronous dataSync () to read data in Tensor. After all models in image-classification can run correctly, I will continue to measure detailed performance data.

TFlite Model Inference Time Before (ms) Inference Time Now (ms) Predict Result
MobileNet V1 59.34 52.15 Right
MobileNet V2 62.60 57.11 Right
SqueeseNet 48.93 66.90 Right
Inception V3 236.21 252.04 Wrong
Inception V4 425.38 426.93 Wrong
Inception ResNet V2 359.25 424.85 Right
ONNX Model Inference Time Before (ms) Inference Time Now (ms) Predict Result
SqueeseNet 41.08 35.52 Right
MobileNet V2 87.77 58.98 Right
ResNet50 V1 155.88 162.48 Wrong
ResNet50 V2 214.68 Crash Wrong
Inception V2 115.53 156.63 Wrong
DenseNet 121 309.74 318.90 Wrong
OpenVINO Model Inference Time Before (ms) Inference Time Now (ms) Predict Result
SqueezeNet 42.10 36.03 Right
MobileNet V1 52.10 38.21 Right
MobileNet V2 58.35 55.91 Right
ResNet50 V1 123.10 139.87 Wrong
DenseNet 121 210.56 185.58 Wrong
Inception V2 108.36 119.97 Wrong
Inception V4 441.36 499.03 Right

NALLEIN avatar Apr 15 '20 07:04 NALLEIN

It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.

2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342

axinging avatar Apr 15 '20 08:04 axinging

It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.

I used Conv2DMMProgram and it seems that no error appears in the fused conv2d operation when inferencing thees correct models.

2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342

I disabled the shaderkey because it made some operations can not get the right results. When inferencing the model, the previous operation runs correctly, the same operation later runs incorrectly, and even many NaN values appear in the result if we enable the shder key. Perhaps the method of generating shaderkey needs further modification.

Taking mobilenet_v1 as an example, you can see that the attributes of Conv2d_3_depthwise and Conv2d_4_depthwise are exactly the same. I guess that at this time, the shader key will be the same, so that it reuses the previous pipeline and due to the different shape of the input Tensor may got the the wrong result.

NALLEIN avatar Apr 15 '20 08:04 NALLEIN

After using “Conv2DNaiveProgram” to perform fusedConv operation, most of the models can be run correctly in webml-polyfill. Fusedconv2d relu bias and prelu still works incorrectly. There are some problems :

  • The running speed of the model is very slow compared to WebGL-backend.

  • Some models will make the browser crash during the model compilation stage.

  • Some operation results in the super-resolution example are still wrong.

Example Model Problem
image_classification resnet50v2.onnx crash
object_detection ssd_mobilenet_v2.tflite tiny_yolov2_coco.tflite tiny_yolov2_voc.tflite crash
face_recogination tiny_yolov2_face.tflite crash
facial_landmark_detection face_landmark.tflite crash
super_resolution srgan_96_4.tflite srgan_128_4.tflite wrong result
emotion_analysis tiny_yolov2_face.tflite crash
emotion_analysis emotion_classification_7.tflite wrong result
speech_commands kws_cnn.tflite wrong result

I will find out what caused the browser to crash during the model compilation stage and modify WebGPUModel.ts to avoid using .data () to read data in tensor.

NALLEIN avatar Apr 18 '20 16:04 NALLEIN

For the speed, after merge https://github.com/tensorflow/tfjs/issues/3095 and https://github.com/tensorflow/tfjs/pull/3049, I found that the handpose model runs faster.

The fusedConv2D may worked correctly with https://github.com/tensorflow/tfjs/issues/3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.

Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

axinging avatar Apr 19 '20 01:04 axinging

For the speed, after merge tensorflow/tfjs#3095 and tensorflow/tfjs#3049, I found that the handpose model runs faster.

The fusedConv2D may worked correctly with tensorflow/tfjs#3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.

Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993

I think the fusedConv2D still exists problems with tensorflow/tfjs#3095, I tried it before and some models still got wrong results. But with tensorflow/tfjs#3049 the inference time reduced.

NALLEIN avatar Apr 19 '20 08:04 NALLEIN

Sorry, a typo above, not 3095, 3095 is a issue, the fix of this issue is tensorflow/tfjs#3096

axinging avatar Apr 19 '20 12:04 axinging

@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.

ibelem avatar May 28 '20 08:05 ibelem

I test the inference time of image-classification models with workload for 200 iterations and the result is as follows:

TFlite Model Inference Time of WebGL (ms) Inference Time of WebGPU (ms)
MobileNet V1 39.69+-4.38 42.27+-7.39
MobileNet V2 36.89+-5.84 55.87+-7.63
SqueeseNet 41.91+-5.65 70.96+-7.70
Inception V3 197.95+-14.26 267.02+-16.86
Inception V4 365.70+-18.43 510.10+-32.97
Inception ResNet V2 317.68+-14.17 455.02+-20.82
TFlite Model Inference Time of WebGL (ms) Inference Time of WebGPU (ms)
SqueeseNet 29.22+-5.59 37.91+-6.05
MobileNet V2 42.89+-4.63 58.84+-15.15
ResNet50 V1 127.51+-12.76 159.31+-8.29
ResNet50 V2 190.72+-9.23 Browser Crash
Inception V2 77.06+-6.17 144.94+-15.87
DenseNet 121 233.46+-18.46 318.80+-39.41
TFlite Model Inference Time of WebGL (ms) Inference Time of WebGPU (ms)
SqueezeNet 30.36+-4.51 38.83+-6.10
MobileNet V1 35.55+-6.71 40.90+-6.66
MobileNet V2 36.65+-8.06 56.79+-12.05
ResNet50 V1 118.39+-9.97 157.48+-26.32
DenseNet 121 132.17+-6.48 187.41+-18.95
Inception V2 74.27+-5.69 116.66+-20.94
Inception V4 371.29+-18.67 537.13+-61.69

NALLEIN avatar Jun 02 '20 02:06 NALLEIN

@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.

The current version of tfjs-backend-webgpu is 0.0.1-alpha.0

NALLEIN avatar Jun 02 '20 02:06 NALLEIN

@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.

huningxin avatar Jun 03 '20 01:06 huningxin

@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.

Thanks, I'll do that.

NALLEIN avatar Jun 03 '20 05:06 NALLEIN