aimet icon indicating copy to clipboard operation
aimet copied to clipboard

AIMET Pro vs Github

Open escorciav opened this issue 1 year ago • 8 comments

Hi folks!

I noticed that I have access to AIMET Pro with my corporate credentials. What are the benefits over the open-source public version?

Best, Victor

escorciav avatar Jan 09 '24 12:01 escorciav

I would like to know this too, as I am in the same situation. 😄

Zagreus98 avatar Mar 07 '24 13:03 Zagreus98

I haven't used it yet. But, if you have access it seems that it just ships "pro" stuff :rofl:

By pro, I mean

  • Stuff related to QNN, (TBC) ease? model preparation, more? Refer to image
  • I believe that it also brings an example & API alike tooling for automatic mixed-precision

image image(1)

escorciav avatar Mar 07 '24 16:03 escorciav

Oh, thanks for the comparison. The Model Preparer Pro is a new one for me. I knew about AMP, it can be useful, I tried it.

Zagreus98 avatar Mar 07 '24 21:03 Zagreus98

How long did it take the AMP? Please provide more details: Use Dataparallel or single GPU. Single PTQ time in min/hours Latency per batch size

Or any other thing you find pertinent.

On Thu, Mar 7, 2024, 21:22 Alexandru Andrei @.***> wrote:

Oh, thanks for the comparison. The Model Preparer Pro is a new one for me. I knew about AMP, it can be useful, I tried it.

— Reply to this email directly, view it on GitHub https://github.com/quic/aimet/issues/2642#issuecomment-1984522689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPSMQE2EWYMPM625ZTC62DYXDLAVAVCNFSM6AAAAABBTA3K5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUGUZDENRYHE . You are receiving this because you authored the thread.Message ID: @.***>

escorciav avatar Mar 08 '24 03:03 escorciav

I was using a single GPU with a large batch size. It took a really long time (around 3 days). It depends a lot on the architecture, because the time is directly proportional with the number of layers in your model and the number of candidate combinations for weight-activations precision. It tries different candidate precisions for every layer in the model and that takes time.

Unfortunately in my case, the model is quite big and I was not tempted to invest more time to optimize this technique. I don't know exactly how much time it takes per layer because that also varies.

Zagreus98 avatar Mar 08 '24 09:03 Zagreus98

Interesting, thanks for sharing.

:question: Did you dig a bit into the code abstractions (or API) associated with the AMP? My first impression was cool! Then, I assume but how do they do that while running efficiently w/o (a) heuristics, or (b) using something parallelization (joblib/Ray/distributed-tasks). I feel that your 3 days & experience give us a hunch.

Nevertheless, nice that a Q-team released & mentioned such a feature.

escorciav avatar Mar 08 '24 10:03 escorciav

For those interested in Qualcomm hardware, the Pro version brings "HwAwareQuantSim". Refer to screenshot for "AIMET Pro" docs citing a file not available is the public version.

Let us know if you find it critical :wink:

image image

escorciav avatar Mar 13 '24 10:03 escorciav

Relevant while setting up AIMET Pro

file:///opt/qcom/aistack/aimet/1.30.0.4794/Docs/api_docs/torch_model_preparer_pro.html?highlight=snpe_sdk_root

image

escorciav avatar Mar 14 '24 13:03 escorciav