dpgen2 icon indicating copy to clipboard operation
dpgen2 copied to clipboard

feat: add streaming output for dp train

Open OutisLi opened this issue 4 months ago • 3 comments

For trainning step, there is no output in log files. We can only see the output after the trainning is completed. However, the trainning normally lasts for days, so the streaming output is nessesary for users to examine the status.

The PR suggests one way to achieve this, however, a better way is to realize this directly in pydflow.

Summary by CodeRabbit

  • New Features
    • Live streaming of training output, providing real-time visibility in the console and continuous writing to train.log.
  • Bug Fixes
    • Eliminated duplicate training log entries by consolidating output handling, resulting in cleaner logs without redundancy.
    • Maintains consistent post-training (“freeze”) logging behavior.

OutisLi avatar Aug 25 '25 07:08 OutisLi

📝 Walkthrough

Walkthrough

Introduces a new run_command_streaming utility for real-time stdout/stderr streaming and log file writing, and updates the training step in dpgen2/op/run_dp_train.py to use it with train.log. Previous in-memory logging for the train step is removed; post-train “freeze” continues using existing logging.

Changes

Cohort / File(s) Summary
Training op: stream logging for train step
dpgen2/op/run_dp_train.py
Replaces run_command with run_command_streaming(..., log_file="train.log") for the training phase; suppresses explicit stdout/stderr writes to fplog for train; retains existing freeze step logging to fplog.
Utilities: new streaming runner
dpgen2/utils/run_command.py
Adds run_command_streaming(cmd, shell=False, log_file=None) that executes a subprocess with concurrent stdout/stderr streaming to terminal and optional log file, using threads; returns (exit_code, stdout, stderr). Existing run_command unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Op as run_dp_train.py
  participant RC as run_command_streaming
  participant Sh as Subprocess
  participant Log as train.log (optional)

  Op->>RC: invoke(cmd, shell=False, log_file="train.log")
  activate RC
  RC->>Sh: Popen(cmd, pipes, line-buffered)
  par Read stdout
    RC->>Sh: spawn stdout reader thread
    loop lines
      Sh-->>RC: stdout line
      RC-->>Op: stream to terminal
      RC-->>Log: append line
    end
  and Read stderr
    RC->>Sh: spawn stderr reader thread
    loop lines
      Sh-->>RC: stderr line
      RC-->>Op: stream to terminal
      RC-->>Log: append line
    end
  end
  Sh-->>RC: exit code
  RC-->>Op: (code, stdout_str, stderr_str)
  deactivate RC
  note over Op: No in-memory fplog write for train step
sequenceDiagram
  autonumber
  participant Op as run_dp_train.py
  participant RCs as run_command_streaming (train)
  participant RC as run_command (freeze)
  participant Log as fplog

  Op->>RCs: Train (streamed to train.log)
  note right of RCs: Output handled by streaming<br/>No fplog writes for train
  Op->>RC: Freeze (non-streaming)
  RC-->>Op: stdout/stderr captured
  Op-->>Log: write freeze stdout/stderr to fplog

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Aug 25 '25 08:08 coderabbitai[bot]

The option print_oe of dflow.utils.run_command does exactly the same thing by virtue of selectors. You only need to add an argument to dflow_run_command. Refer to https://github.com/deepmodeling/dflow/blob/48c24cc4f494acb5a12c8d99293f1156c31342ad/src/dflow/utils.py#L657. Besides, I think we should provide an option rather than change the default behavior (which is silent except for error).

Thanks, is there any way to realize this in the dpgen2 configuration json?

OutisLi avatar Aug 26 '25 10:08 OutisLi

It is not supported currently. You can add an argument in config json to control the output.

zjgemi avatar Sep 02 '25 09:09 zjgemi