FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[GPU] remove _execute_empty_input and fix some code style

Open zhoutianzi666 opened this issue 1 month ago • 7 comments

Motivation

去掉 self._execute_empty_input(),统一模型调用入口!

Modifications

Usage or Command

Accuracy Tests

Checklist

  • [ ] Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • [ ] Format your code, run pre-commit before commit.
  • [ ] Add unit tests. Please write the reason in this PR if no unit tests.
  • [ ] Provide accuracy results.
  • [ ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

zhoutianzi666 avatar Nov 15 '25 16:11 zhoutianzi666

Thanks for your contribution!

paddle-bot[bot] avatar Nov 15 '25 16:11 paddle-bot[bot]

请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?

Wanglongzhi2001 avatar Nov 17 '25 03:11 Wanglongzhi2001

请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?

  1. 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
  2. 空跑的时候确保可以进入cuda graph,减少单条query时延

zhoutianzi666 avatar Nov 17 '25 04:11 zhoutianzi666

请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?

  1. 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
  2. 空跑的时候确保可以进入cuda graph,减少单条query时延

唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?

Wanglongzhi2001 avatar Nov 17 '25 04:11 Wanglongzhi2001

请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?

  1. 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
  2. 空跑的时候确保可以进入cuda graph,减少单条query时延

唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?

”空跑已经是可以进入 cudagraph 的了“ 这个指的是 model.forward 已经支持空跑了,还是 empty_input_forward 能进 cuda graph 了

gongshaotian avatar Nov 17 '25 04:11 gongshaotian

请问下为什么要删掉_empty_input_forward呢?EP 需要这个逻辑,删掉的话能跑通吗?

  1. 减少代码量,保证模型forward入口只有1处,方便支持TBO处理corner case
  2. 空跑的时候确保可以进入cuda graph,减少单条query时延

唔,可以再解释下么~,现在空跑已经是可以进入 cudagraph 的了, EP 下现在把空跑删掉的话,假设EP8只有一张卡有收到数据的话,在 dispatch 的时候其他卡没有进入空跑的话还怎么进行all2all的 dispatch 和 combine 通信呢?

”空跑已经是可以进入 cudagraph 的了“ 这个指的是 model.forward 已经支持空跑了,还是 empty_input_forward 能进 cuda graph 了

抱歉,我说的不大对,应该是空跑和 cudagraph 不冲突,EP 在现在空跑逻辑下也能进 cudagraph

Wanglongzhi2001 avatar Nov 17 '25 05:11 Wanglongzhi2001

Codecov Report

:x: Patch coverage is 66.66667% with 5 lines in your changes missing coverage. Please review. :warning: Please upload report for BASE (develop@5bcf79d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/gpu_model_runner.py 66.66% 2 Missing and 2 partials :warning:
fastdeploy/model_executor/models/ernie4_5_moe.py 66.66% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5067   +/-   ##
==========================================
  Coverage           ?   57.69%           
==========================================
  Files              ?      317           
  Lines              ?    38441           
  Branches           ?     5763           
==========================================
  Hits               ?    22177           
  Misses             ?    14489           
  Partials           ?     1775           
Flag Coverage Δ
diff 57.69% <66.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Nov 21 '25 11:11 codecov-commenter