feat: improve pymodel bert perf
- support copy kernel for new prefill cuda graph framework
- improve cuda graph framework cpu perf
- test and check: 3.1 In long text scene, pymodel bert is better than cpp engine 3.2 In short text scene, cuda graph pymodel bert can improve performance up to 20%, but failed to conqueue cpp engine, the reason is that the pymodel prepare work cost more 0.2~0.3ms than cpp engine. We will improve this in next PR.
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!
internal source has been updated, please review the changes!