mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

Add C++ Predictor class for inference

Open kice opened this issue 6 years ago • 19 comments
trafficstars

Description

C++ Predictor class for easy inference

  • Support quantized model

  • Support non-float32 data input and output

Comments

Both c-predict-api and cpp-package are missing data type during copying. Please fix XD.

BTW, I can get around 2x performance by uint8 quantizing my model.

UPDATE: I tested uint8 quantization again, and I got about 2x more of GPU memory usage for uint8 quantized model and predit time is 4x longer than fp32 model.

kice avatar Dec 21 '18 17:12 kice

@kice Thanks for contributing this. Can you check the failing job and add the fix? @nswamy @leleamol Could you review this PR?

@mxnet-label-bot Add [pr-awaiting-review]

Roshrini avatar Dec 21 '18 20:12 Roshrini

@leleamol - Can you please help review this PR?

sandeep-krishnamurthy avatar Dec 26 '18 20:12 sandeep-krishnamurthy

@leleamol @nswamy Can you help review this PR?

Roshrini avatar Jan 08 '19 00:01 Roshrini

Hey, @kice , would you mind sharing a bit more about your use case(s) for C++ API? We (one of the Apache MXNet development teams) are trying to decide where to take it next as its current implementation is a bit messy. We hope your reply could share pointers to what is that needed for you that C++ doesn't currently do or does inefficiently. It seems you are trying to create a higher level abstraction for inference. I wonder if it would make sense from your use case perspective to implement even higher abstractions like ObjectDetector API or ImageClassifier API. Would you mind sharing a bit more what you are trying to use it for and how?

ddavydenko avatar Jan 17 '19 01:01 ddavydenko

Hey, @kice , would you mind sharing a bit more about your use case(s) for C++ API? We (one of the Apache MXNet development teams) are trying to decide where to take it next as its current implementation is a bit messy. We hope your reply could share pointers to what is that needed for you that C++ doesn't currently do or does inefficiently. It seems you are trying to create a higher level abstraction for inference. I wonder if it would make sense from your use case perspective to implement even higher abstractions like ObjectDetector API or ImageClassifier API. Would you mind sharing a bit more what you are trying to use it for and how?

@ddavydenko To summary at the begining, I think MXNet should focus more on inference and easy deployment. If possible, please fix INT8 quantization and make a MXNet Lite.

I think my use cases are a little bit minority; I use MXNet to process video, more specific single image super-resolution. Before MXNet, I used Caffe1 a lot (see https://github.com/lltcggie/waifu2x-caffe); but since Caffe1 is very old and inefficient, I switched to MXNet (https://github.com/kice/vs_mxnet). In this case, it is very sensitive to runtime speed; and in fact, there are some of my friends also use MXNet just for perforamce boost. That's also the reason why I tried to use INT8 quantization (and MXNet dont have proper support for it). I also tried TVM, but there are more hazards that MXNet and I just gave up.

That's for end users; I as a developer, it is good to see some higer level warppers but I perfer the ablity to use my models. But it will be impossible without easy deployment on Windows (and maybe other platforms). If the end users cannot install and run the program powered by MXNet, then the use cases for MXNet is very limited. For higher abstractions, if I need to feed the program a lot of data and parameters, like which json symbol, which params files, making the C API more easy to use is much better.

Speaking of custom models, I perfer to use tf or pytorch to train my models, and port the trained models to MXNet; since more people use TF and pytorch and more resources on them. But compare to them, MXNet is by far the best and the fastest way to be used in Windows.

For now, I am working on a project that can upsample old games that dont have HD resource to HD in real time. Base on my calculation, a RTX 2070 can upscale anything to 1080p by useing Tensor Cores or a GTX 1080 with INT8 inference. With TensorRT, that will make it even better. However, these things look like wont happend on MXNet anytime soon.

kice avatar Jan 19 '19 00:01 kice

@stu1130 @ChaiBapchya for further review/approval

vandanavk avatar Feb 05 '19 19:02 vandanavk

@stu1130 @ChaiBapchya It seems that your comments has been addressed. Could you review it again?

ankkhedia avatar Feb 14 '19 23:02 ankkhedia

@KellenSunderland could you please review this PR?

anirudhacharya avatar Mar 04 '19 00:03 anirudhacharya

@stu1130 @ChaiBapchya @KellenSunderland Could you please review this PR again? Thank you!

karan6181 avatar Mar 19 '19 00:03 karan6181

@stu1130 @ChaiBapchya @KellenSunderland Could you please have a look and see if we can move it forward?

abhinavs95 avatar Mar 28 '19 22:03 abhinavs95

@leleamol Can you also review this PR here ?

piyushghai avatar Apr 02 '19 21:04 piyushghai

@stu1130 @ChaiBapchya Can you take a look again?

Roshrini avatar Apr 17 '19 16:04 Roshrini

@ddavydenko ping to help move forward, let's get this merged. Thanks!

roywei avatar Apr 29 '19 16:04 roywei

@leleamol I will add what you suggested in few days.

kice avatar Apr 29 '19 18:04 kice

@mxnet-label-bot update [C++, pr-work-in-progress]

vandanavk avatar May 09 '19 03:05 vandanavk

@kice Did you get the time to work on what @leleamol had suggested? Thanks!

karan6181 avatar May 21 '19 20:05 karan6181

@karan6181 I think I might start working on this in few days.

kice avatar May 24 '19 10:05 kice

@kice It's been few months now. Are you blocked onto something? Can I help unblock you in anyway? Thanks.

ChaiBapchya avatar Oct 28 '19 15:10 ChaiBapchya

@ChaiBapchya feel free to make change base on my code. Since I am graduating this year, I don't think I will continue to work on this PR any time soon.

kice avatar Oct 28 '19 18:10 kice