webml-polyfill
webml-polyfill copied to clipboard
[WebGPU] Enable tfjs-backend-webgpu in webml-polyfill
Model inference in polyfill using tfjs-backend-webgpu.
I tried to integrate with WebGPU backend of TF.js : You have to build Tf.js and then replace the path in package.json with your own path.This branch could run in Chrome Dev on MacOS
build TF.js
git clone https://github.com/NALLEIN/tfjs.git
cd tfjs
git checkout -b opsForPolyfill origin/opsForPolyfill
yarn
cd tfjs-core
yarn & yarn build
cd ../tfjs-backend-wengpu
yarn & yarn build
replace WebGL backend with tfjs-backend-webgpu
Replace the path of 'local_tfjs' and 'local_webgpu' in package.json with TF.js path of your own
git clone https://github.com/NALLEIN/webml-polyfill.git
cd webml-polyfill
git checkoyt -b WebGPU-backend-test origin/WebGPU-backend-test
yarn & yarn start
And then you can test the examples of polyfill.
TODO
Model inference can execute without error but the result is incorrect. Use the analysis tools to locate problem.
@NALLEIN , could you please send a PR to preview your change?
Conv2d with relu6 will be wrong when running the image classification model. I will reproduce the error to Xu Xin and tfjs.
Failed test case in CTS |
---|
check result for Add v1_2 example-1‣ |
check result for Argmax example/1-4‣ |
check result for Argmax example/2-4‣ |
check result for Argmax example/3-4 |
check result for Avg pool float relu6 example/1 |
check result for Batch to space example |
check result for Batch to space float example/1‣ |
check result for Conv2d v1_2 example-21 |
check result for Conv2d v1_2 example-22 |
check result for Conv2d v1_2 example-25 |
check result for Conv2d v1_2 example-26 |
check result for Conv2d v1_2 example-27 |
check result for Conv2d v1_2 example-30 |
check result for Conv2d v1_2 example-31 |
check result for Conv2d v1_2 example-32 |
check result for Conv2d v1_2 example-36 |
check result for Conv2d v1_2 example-37 |
check result for Conv2d v1_2 example-38 |
check result for Conv2d v1_2 example-42 |
check result for Conv 1 h3 w2 same relu6 example-1 |
check result for Conv 1 h3 w2 same relu6 example-2 |
check result for Conv 1 h3 w2 valid relu6 example-1 |
check result for Conv 1 h3 w2 valid relu6 example-2 |
check result for Conv 3 h3 w2 same relu6 example-1 |
check result for Conv 3 h3 w2 same relu6 example-2 |
check result for Conv 3 h3 w2 valid relu6 example-1 |
check result for Conv 3 h3 w2 valid relu6 example-2 |
check result for Conv float channels example |
check result for Conv float channels relaxed example |
check result for Conv float channels relu example |
check result for Conv float channels relu6 example |
check result for Conv float channels weights as inputs example |
check result for Conv float channels weights as inputs relaxed example |
check result for Conv float channels weights as inputs relu example |
check result for Conv float channels weights as inputs relu6 example |
check result for Conv float large example |
check result for Conv float large relaxed example |
check result for Conv float large relu example |
check result for Conv float large relu6 example |
check result for Conv float large weights as inputs example |
check result for Conv float large weights as inputs relaxed example |
check result for Conv float large weights as inputs relu example |
check result for Conv float large weights as inputs relu6 example |
check result for Depthwise conv2d float large example/2 |
check result for Depthwise conv2d float large 2 relaxed example |
check result for Depthwise conv2d float large relu example/2 |
check result for Depthwise conv2d float large relu1 example/2 |
check result for Depthwise conv2d float large relu6 example/2 |
check result for Depthwise conv2d float large 2 weights as inputs example |
check result for Depthwise conv2d float large 2 weights as inputs relaxed example |
check result for Depthwise conv2d float large 2 weights as inputs relu example |
check result for Depthwise conv2d float large 2 weights as inputs relu1 example |
check result for Depthwise conv2d float large 2 weights as inputs relu6 example |
check result for Depthwise conv2d float large relu6 example |
check result for Depthwise conv2d float large weights as inputs relu6 example |
check result for Depthwise conv2d float relu6 example |
check result for Depthwise conv2d float weights as inputs relu6 example |
check result for Depthwise conv2d v1_2 example-33 |
check result for Depthwise conv2d v1_2 example-34 |
check result for Depthwise conv2d v1_2 example-35 |
check result for Depthwise conv2d v1_2 example-38 |
check result for Depthwise conv2d v1_2 example-39 |
check result for Depthwise conv2d v1_2 example-40 |
check result for Depthwise conv relu6 example-1 |
check result for Depthwise conv relu6 example-2 |
check result for Fully connected float relu6 example |
check result for Max pool float relu6 example/1 |
check result for Mul relu6 example |
check result for Transpose example |
check result for Transpose float16 example |
check result for Transpose float example/1 |
check result for Transpose relaxed example |
check result for Transpose v1_2 example-1 |
check result for Transpose v1_2 example-2 |
Failed test case in CTS Supplement Test |
---|
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-3 |
check result for ATROUS_CONV_2D 1 h3 w2 implicit padding same example-4 |
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-3 |
check result for ATROUS_CONV_2D 3 h3 w2 implicit padding same example-4 |
check result for ATROUS_DEPTHWISE_CONV_2D valid example-2 |
When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.
With below two PRs applied, certain cases may works: For conv2d (non-fused): try this https://github.com/tensorflow/tfjs/pull/2993 For fused-conv2d: try https://github.com/tensorflow/tfjs/pull/2846 and https://github.com/tensorflow/tfjs/pull/2993
However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): https://github.com/axinging/tfjs/commit/78b5eaa0e592d90e21cf155b894916229a9ea409#diff-dcb528c192f70859b8f4333e400b445fL777
Please try above first, if still reports error, please let me know.
When debugging the handpose model, I found that (fused) conv2d may work not properly, with certain shape/stride/padding.
With below two PRs applied, certain cases may works: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993
However, inorder to make handpose works, we need above two and below one (Use naive conv2d, instead of conv2dmm, this means something is still wrong with fused-conv2dmm, and we are working on fix it): axinging/tfjs@78b5eaa#diff-dcb528c192f70859b8f4333e400b445fL777
Please try above first, if still reports error, please let me know.
After modifying the getAndSavePipelineh function, and fix dilation according to tensorflow/tfjs#2846 and tensorflow/tfjs#2993. Partial image classfication model can be run correctly in webml-polyfill. The previous getAndSavePipelien function makes many operations can not get the correct results. The inference time listed are the results of running only once. I used Conv2DMMProgram. It's very slow because of using the asynchronous function data () to read the data in Tensor tfjs-backend-webgpu currently does not support synchronous dataSync () to read data in Tensor. After all models in image-classification can run correctly, I will continue to measure detailed performance data.
TFlite Model | Inference Time Before (ms) | Inference Time Now (ms) | Predict Result |
---|---|---|---|
MobileNet V1 | 59.34 | 52.15 | Right |
MobileNet V2 | 62.60 | 57.11 | Right |
SqueeseNet | 48.93 | 66.90 | Right |
Inception V3 | 236.21 | 252.04 | Wrong |
Inception V4 | 425.38 | 426.93 | Wrong |
Inception ResNet V2 | 359.25 | 424.85 | Right |
ONNX Model | Inference Time Before (ms) | Inference Time Now (ms) | Predict Result |
---|---|---|---|
SqueeseNet | 41.08 | 35.52 | Right |
MobileNet V2 | 87.77 | 58.98 | Right |
ResNet50 V1 | 155.88 | 162.48 | Wrong |
ResNet50 V2 | 214.68 | Crash | Wrong |
Inception V2 | 115.53 | 156.63 | Wrong |
DenseNet 121 | 309.74 | 318.90 | Wrong |
OpenVINO Model | Inference Time Before (ms) | Inference Time Now (ms) | Predict Result |
---|---|---|---|
SqueezeNet | 42.10 | 36.03 | Right |
MobileNet V1 | 52.10 | 38.21 | Right |
MobileNet V2 | 58.35 | 55.91 | Right |
ResNet50 V1 | 123.10 | 139.87 | Wrong |
DenseNet 121 | 210.56 | 185.58 | Wrong |
Inception V2 | 108.36 | 119.97 | Wrong |
Inception V4 | 441.36 | 499.03 | Right |
It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.
2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342
It's great to see that. Thanks @NALLEIN. Two comments: 1), “used Conv2DMMProgram” ? Or used Conv2DNaiveProgram? Conv2DNaiveProgram should be the slow way.
I used Conv2DMMProgram and it seems that no error appears in the fused conv2d operation when inferencing thees correct models.
2), “modifying the getAndSavePipelineh function” means to disable shader key? Some case may pass when shader key disabled. But, disable shader key may slow down the model. https://github.com/tensorflow/tfjs/pull/2670/files#diff-dcb528c192f70859b8f4333e400b445fL342
I disabled the shaderkey because it made some operations can not get the right results. When inferencing the model, the previous operation runs correctly, the same operation later runs incorrectly, and even many NaN values appear in the result if we enable the shder key. Perhaps the method of generating shaderkey needs further modification.
Taking mobilenet_v1 as an example, you can see that the attributes of Conv2d_3_depthwise and Conv2d_4_depthwise are exactly the same. I guess that at this time, the shader key will be the same, so that it reuses the previous pipeline and due to the different shape of the input Tensor may got the the wrong result.
After using “Conv2DNaiveProgram” to perform fusedConv operation, most of the models can be run correctly in webml-polyfill. Fusedconv2d relu bias and prelu still works incorrectly. There are some problems :
-
The running speed of the model is very slow compared to WebGL-backend.
-
Some models will make the browser crash during the model compilation stage.
-
Some operation results in the super-resolution example are still wrong.
Example | Model | Problem |
---|---|---|
image_classification | resnet50v2.onnx | crash |
object_detection | ssd_mobilenet_v2.tflite tiny_yolov2_coco.tflite tiny_yolov2_voc.tflite | crash |
face_recogination | tiny_yolov2_face.tflite | crash |
facial_landmark_detection | face_landmark.tflite | crash |
super_resolution | srgan_96_4.tflite srgan_128_4.tflite | wrong result |
emotion_analysis | tiny_yolov2_face.tflite | crash |
emotion_analysis | emotion_classification_7.tflite | wrong result |
speech_commands | kws_cnn.tflite | wrong result |
I will find out what caused the browser to crash during the model compilation stage and modify WebGPUModel.ts to avoid using .data () to read data in tensor.
For the speed, after merge https://github.com/tensorflow/tfjs/issues/3095 and https://github.com/tensorflow/tfjs/pull/3049, I found that the handpose model runs faster.
The fusedConv2D may worked correctly with https://github.com/tensorflow/tfjs/issues/3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.
Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993
For the speed, after merge tensorflow/tfjs#3095 and tensorflow/tfjs#3049, I found that the handpose model runs faster.
The fusedConv2D may worked correctly with tensorflow/tfjs#3095, so if you mannually merge this PR, no need to use Conv2DNaiveProgram. This PR fixed the relu bias and prelu issues.
Below two PR is already merged in the tfjs: For conv2d (non-fused): try this tensorflow/tfjs#2993 For fused-conv2d: try tensorflow/tfjs#2846 and tensorflow/tfjs#2993
I think the fusedConv2D still exists problems with tensorflow/tfjs#3095, I tried it before and some models still got wrong results. But with tensorflow/tfjs#3049 the inference time reduced.
Sorry, a typo above, not 3095, 3095 is a issue, the fix of this issue is tensorflow/tfjs#3096
@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.
I test the inference time of image-classification models with workload for 200 iterations and the result is as follows:
TFlite Model | Inference Time of WebGL (ms) | Inference Time of WebGPU (ms) |
---|---|---|
MobileNet V1 | 39.69+-4.38 | 42.27+-7.39 |
MobileNet V2 | 36.89+-5.84 | 55.87+-7.63 |
SqueeseNet | 41.91+-5.65 | 70.96+-7.70 |
Inception V3 | 197.95+-14.26 | 267.02+-16.86 |
Inception V4 | 365.70+-18.43 | 510.10+-32.97 |
Inception ResNet V2 | 317.68+-14.17 | 455.02+-20.82 |
TFlite Model | Inference Time of WebGL (ms) | Inference Time of WebGPU (ms) |
---|---|---|
SqueeseNet | 29.22+-5.59 | 37.91+-6.05 |
MobileNet V2 | 42.89+-4.63 | 58.84+-15.15 |
ResNet50 V1 | 127.51+-12.76 | 159.31+-8.29 |
ResNet50 V2 | 190.72+-9.23 | Browser Crash |
Inception V2 | 77.06+-6.17 | 144.94+-15.87 |
DenseNet 121 | 233.46+-18.46 | 318.80+-39.41 |
TFlite Model | Inference Time of WebGL (ms) | Inference Time of WebGPU (ms) |
---|---|---|
SqueezeNet | 30.36+-4.51 | 38.83+-6.10 |
MobileNet V1 | 35.55+-6.71 | 40.90+-6.66 |
MobileNet V2 | 36.65+-8.06 | 56.79+-12.05 |
ResNet50 V1 | 118.39+-9.97 | 157.48+-26.32 |
DenseNet 121 | 132.17+-6.48 | 187.41+-18.95 |
Inception V2 | 74.27+-5.69 | 116.66+-20.94 |
Inception V4 | 371.29+-18.67 | 537.13+-61.69 |
@NALLEIN said there is a new update of https://www.npmjs.com/package/webgpu , please investigate it and update the latest status.
The current version of tfjs-backend-webgpu is 0.0.1-alpha.0
@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.
@NALLEIN , I just checked with @axinging , if you have any new ops implementation for TF.js WebGPU backend, please feel free to submit your PR to TF.js repo. If there is no open issue for that op, you can file an issue as well. New ops are welcome for them.
Thanks, I'll do that.