feat: add streaming output for dp train
For trainning step, there is no output in log files. We can only see the output after the trainning is completed. However, the trainning normally lasts for days, so the streaming output is nessesary for users to examine the status.
The PR suggests one way to achieve this, however, a better way is to realize this directly in pydflow.
Summary by CodeRabbit
- New Features
- Live streaming of training output, providing real-time visibility in the console and continuous writing to train.log.
- Bug Fixes
- Eliminated duplicate training log entries by consolidating output handling, resulting in cleaner logs without redundancy.
- Maintains consistent post-training (“freeze”) logging behavior.
📝 Walkthrough
Walkthrough
Introduces a new run_command_streaming utility for real-time stdout/stderr streaming and log file writing, and updates the training step in dpgen2/op/run_dp_train.py to use it with train.log. Previous in-memory logging for the train step is removed; post-train “freeze” continues using existing logging.
Changes
| Cohort / File(s) | Summary |
|---|---|
Training op: stream logging for train stepdpgen2/op/run_dp_train.py |
Replaces run_command with run_command_streaming(..., log_file="train.log") for the training phase; suppresses explicit stdout/stderr writes to fplog for train; retains existing freeze step logging to fplog. |
Utilities: new streaming runnerdpgen2/utils/run_command.py |
Adds run_command_streaming(cmd, shell=False, log_file=None) that executes a subprocess with concurrent stdout/stderr streaming to terminal and optional log file, using threads; returns (exit_code, stdout, stderr). Existing run_command unchanged. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant Op as run_dp_train.py
participant RC as run_command_streaming
participant Sh as Subprocess
participant Log as train.log (optional)
Op->>RC: invoke(cmd, shell=False, log_file="train.log")
activate RC
RC->>Sh: Popen(cmd, pipes, line-buffered)
par Read stdout
RC->>Sh: spawn stdout reader thread
loop lines
Sh-->>RC: stdout line
RC-->>Op: stream to terminal
RC-->>Log: append line
end
and Read stderr
RC->>Sh: spawn stderr reader thread
loop lines
Sh-->>RC: stderr line
RC-->>Op: stream to terminal
RC-->>Log: append line
end
end
Sh-->>RC: exit code
RC-->>Op: (code, stdout_str, stderr_str)
deactivate RC
note over Op: No in-memory fplog write for train step
sequenceDiagram
autonumber
participant Op as run_dp_train.py
participant RCs as run_command_streaming (train)
participant RC as run_command (freeze)
participant Log as fplog
Op->>RCs: Train (streamed to train.log)
note right of RCs: Output handled by streaming<br/>No fplog writes for train
Op->>RC: Freeze (non-streaming)
RC-->>Op: stdout/stderr captured
Op-->>Log: write freeze stdout/stderr to fplog
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. - PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
CodeRabbit Commands (Invoked using PR/Issue comments)
Type @coderabbitai help to get the list of available commands.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Status, Documentation and Community
- Visit our Status Page to check the current availability of CodeRabbit.
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
The option
print_oeofdflow.utils.run_commanddoes exactly the same thing by virtue of selectors. You only need to add an argument todflow_run_command. Refer to https://github.com/deepmodeling/dflow/blob/48c24cc4f494acb5a12c8d99293f1156c31342ad/src/dflow/utils.py#L657. Besides, I think we should provide an option rather than change the default behavior (which is silent except for error).
Thanks, is there any way to realize this in the dpgen2 configuration json?
It is not supported currently. You can add an argument in config json to control the output.