mmdeploy icon indicating copy to clipboard operation
mmdeploy copied to clipboard

IPU backend review from fork

Open gqingraphcore opened this issue 1 year ago • 10 comments

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

gqingraphcore avatar Dec 22 '22 12:12 gqingraphcore

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

:white_check_mark: irexyc
:x: gqingraphcore
:x: gongqiang
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Dec 22 '22 12:12 CLAassistant

May fix the lint error and resolve conflicts.

AllentDan avatar Jan 03 '23 05:01 AllentDan

Note that we have refactored the backend manager. Implement the interface in the manager to support various features in MMDeploy. We do not need to update deploy.py when adding a new backend.

grimoire avatar Jan 04 '23 03:01 grimoire

Note that we have refactored the backend manager. Implement the interface in the manager to support various features in MMDeploy. We do not need to update deploy.py when adding a new backend.

revised

gqingraphcore avatar Jan 04 '23 05:01 gqingraphcore

截屏2023-01-04 下午5 54 19 profiler test result with resnet50

gqingraphcore avatar Jan 04 '23 09:01 gqingraphcore

Please solve the conflict

grimoire avatar Jan 12 '23 06:01 grimoire

Please solve the lint failure with pre-commit run --all-file.

grimoire avatar Jan 16 '23 03:01 grimoire

To support sdk, we should add --dump-info when convert a model, like python tools/deploy.py ... --dump-info ... This requires us to modify the sdk export logic https://github.com/gqingraphcore/mmdeploy-ipu/blob/ipu/mmdeploy/backend/sdk/export_info.py#L118

irexyc avatar Jan 31 '23 03:01 irexyc

It seems we can't feed the network with fp32 tensort when convert to fp16 model.

irexyc avatar Feb 07 '23 12:02 irexyc

I make some changes to let sdk run https://github.com/gqingraphcore/mmdeploy-ipu/pull/1

Since mmdeploy sdk didn't support fp16 as input, I modify configs/_base_/backends/ipu.py, change precision to fp16 and delete partialsTypeMatMuls(don't know if this is necessary)

However in my test for sdk, the score for mmcls is unstable and wrong. Below is some result of image_classification:

// run 1
label: 798, score: 0.1349
label: 916, score: 0.1143
label: 111, score: 0.0661
label: 549, score: 0.0577
label: 688, score: 0.0185

// run 2
label: 644, score: 1.0000
label: 1, score: 0.0000
label: 3, score: 0.0000
label: 4, score: 0.0000
label: 0, score: 0.0000

// run3
label: 892, score: 0.0288
label: 111, score: 0.0234
label: 623, score: 0.0166
label: 846, score: 0.0158
label: 677, score: 0.0132

And the result of convert and inferenced by model_runtime seems also unstable(better than sdk, most times it gives right result). I convert the resnet18 model several times, and some times the visualized results shows wrong label with score of 1. When the visualized results is right, I use model_runtime to inference the popef model several times, and some thimes the results seems wrong.

irexyc avatar Mar 10 '23 07:03 irexyc