FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[BugFix] skip model executing after clearing/updating is done

Open liyonghua0910 opened this issue 2 weeks ago • 2 comments

Motivation

对于 EP 模型,清除权重后的下一次权重更新,会直接执行一次 event_loop_normal 后半段的 execute_model,而此时 weight/cache 都还没有重建,导致执行出错。

:bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

:bulb: 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

在权重清除/更新后,让 worker 跳过后半段的执行,回到 event_loop_normal 的开头,重新读取状态信号。此项改动对于权重清除场景是必要的,对于权重更新场景也是无害的。

Usage or Command

Accuracy Tests

Checklist

  • [x] Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • [x] Format your code, run pre-commit before commit.
  • [x] Add unit tests. Please write the reason in this PR if no unit tests.
  • [x] Provide accuracy results.
  • [x] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

liyonghua0910 avatar Dec 12 '25 06:12 liyonghua0910

Thanks for your contribution!

paddle-bot[bot] avatar Dec 12 '25 06:12 paddle-bot[bot]

Codecov Report

:x: Patch coverage is 0% with 1 line in your changes missing coverage. Please review. :warning: Please upload report for BASE (develop@d67388a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/worker_process.py 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5527   +/-   ##
==========================================
  Coverage           ?   60.75%           
==========================================
  Files              ?      329           
  Lines              ?    41138           
  Branches           ?     6270           
==========================================
  Hits               ?    24993           
  Misses             ?    14255           
  Partials           ?     1890           
Flag Coverage Δ
GPU 60.75% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Dec 12 '25 07:12 codecov-commenter