coremltools
coremltools copied to clipboard
Can the decode of transformer be accelerated by NPU
🌱 Describe Feature Request
I trained a Transformer model. When I converted it as a whole into an mlmodel, I found that its intelligence could only be processed on the cpu. After splitting it into encode and decode, I discovered that encode could be normally accelerated using the NPU, but decode could only be processed on the cpu. Is it because decode is self-decoding, not a static issue? If decode can be accelerated by NPU, could a method for converting pt to mlmodel or mlpackage be provided thanks