Can the decode of transformer be accelerated by NPU

Open 119458 opened this issue 2 months ago • 0 comments

🌱 Describe Feature Request

I trained a Transformer model. When I converted it as a whole into an mlmodel, I found that its intelligence could only be processed on the cpu. After splitting it into encode and decode, I discovered that encode could be normally accelerated using the NPU, but decode could only be processed on the cpu. Is it because decode is self-decoding, not a static issue? If decode can be accelerated by NPU, could a method for converting pt to mlmodel or mlpackage be provided thanks

Oct 27 '25 08:10 119458