aimet
aimet copied to clipboard
AIMET Pro vs Github
Hi folks!
I noticed that I have access to AIMET Pro with my corporate credentials. What are the benefits over the open-source public version?
Best, Victor
I would like to know this too, as I am in the same situation. 😄
I haven't used it yet. But, if you have access it seems that it just ships "pro" stuff :rofl:
By pro, I mean
- Stuff related to QNN, (TBC) ease? model preparation, more? Refer to image
- I believe that it also brings an example & API alike tooling for automatic mixed-precision
Oh, thanks for the comparison. The Model Preparer Pro is a new one for me. I knew about AMP, it can be useful, I tried it.
How long did it take the AMP? Please provide more details: Use Dataparallel or single GPU. Single PTQ time in min/hours Latency per batch size
Or any other thing you find pertinent.
On Thu, Mar 7, 2024, 21:22 Alexandru Andrei @.***> wrote:
Oh, thanks for the comparison. The Model Preparer Pro is a new one for me. I knew about AMP, it can be useful, I tried it.
— Reply to this email directly, view it on GitHub https://github.com/quic/aimet/issues/2642#issuecomment-1984522689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPSMQE2EWYMPM625ZTC62DYXDLAVAVCNFSM6AAAAABBTA3K5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUGUZDENRYHE . You are receiving this because you authored the thread.Message ID: @.***>
I was using a single GPU with a large batch size. It took a really long time (around 3 days). It depends a lot on the architecture, because the time is directly proportional with the number of layers in your model and the number of candidate combinations for weight-activations precision. It tries different candidate precisions for every layer in the model and that takes time.
Unfortunately in my case, the model is quite big and I was not tempted to invest more time to optimize this technique. I don't know exactly how much time it takes per layer because that also varies.
Interesting, thanks for sharing.
:question: Did you dig a bit into the code abstractions (or API) associated with the AMP? My first impression was cool! Then, I assume but how do they do that while running efficiently w/o (a) heuristics, or (b) using something parallelization (joblib/Ray/distributed-tasks). I feel that your 3 days & experience give us a hunch.
Nevertheless, nice that a Q-team released & mentioned such a feature.
For those interested in Qualcomm hardware, the Pro version brings "HwAwareQuantSim". Refer to screenshot for "AIMET Pro" docs citing a file not available is the public version.
Let us know if you find it critical :wink:
Relevant while setting up AIMET Pro
file:///opt/qcom/aistack/aimet/1.30.0.4794/Docs/api_docs/torch_model_preparer_pro.html?highlight=snpe_sdk_root