[GPU] remove _execute_empty_input and fix some code style
Motivation
去掉 self._execute_empty_input(),统一模型调用入口!
Modifications
Usage or Command
Accuracy Tests
Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]] - You can add new tags based on the PR content, but the semantics must be clear.
- Tag list: [
- [ ] Format your code, run
pre-commitbefore commit. - [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the
releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.
Thanks for your contribution!
请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?
请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?
- 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
- 空跑的时候确保可以进入cuda graph,减少单条query时延
请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?
- 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
- 空跑的时候确保可以进入cuda graph,减少单条query时延
唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?
请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?
- 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
- 空跑的时候确保可以进入cuda graph,减少单条query时延
唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?
”空跑已经是可以进入 cudagraph 的了“ 这个指的是 model.forward 已经支持空跑了,还是 empty_input_forward 能进 cuda graph 了
请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?
- 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
- 空跑的时候确保可以进入cuda graph,减少单条query时延
唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?
”空跑已经是可以进入 cudagraph 的了“ 这个指的是 model.forward 已经支持空跑了,还是 empty_input_forward 能进 cuda graph 了
抱歉,我说的不大对,应该是空跑和 cudagraph 不冲突,EP 在现在空跑逻辑下也能进 cudagraph
Codecov Report
:x: Patch coverage is 66.66667% with 5 lines in your changes missing coverage. Please review.
:warning: Please upload report for BASE (develop@5bcf79d). Learn more about missing BASE report.
Additional details and impacted files
@@ Coverage Diff @@
## develop #5067 +/- ##
==========================================
Coverage ? 57.69%
==========================================
Files ? 317
Lines ? 38441
Branches ? 5763
==========================================
Hits ? 22177
Misses ? 14489
Partials ? 1775
| Flag | Coverage Δ | |
|---|---|---|
| diff | 57.69% <66.66%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.