models icon indicating copy to clipboard operation
models copied to clipboard

[Orbit]Handle iterator exhaustion in Controller.py

Open LINYV0719 opened this issue 2 months ago • 0 comments

Description

This PR addresses the TODO in orbit/controller.py to support steps=-1 in Controller.train(), allowing training to run until the underlying dataset is exhausted.

Motivation: Previously, Controller.train required a fixed number of steps. This change allows users to train for a full epoch (or until the dataset runs out) without needing to know the exact dataset size beforehand, which is common when using tf.data.Dataset.

Changes: -Modified Controller.train loop condition to accept steps=-1. -Added a try-except block to catch tf.errors.OutOfRangeError and StopIteration during _train_n_steps. This ensures the loop exits gracefully when the iterator is exhausted instead of crashing. -Added logic to break the loop if the global_step increment is less than expected (another indicator of exhaustion). -Added a new test case test_train_until_exhaustion in orbit/controller_test.py to verify this behavior using a finite dataset.

Type of change

For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.

Note: Please delete options that are not relevant.

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)

Tests

I verified the changes by running the new test case and existing tests.

Test Configuration:

OS: Windows 11 Python Version: 3.10 Command: python -m orbit.controller_test Result: Passed. specifically, test_train_until_exhaustion passed with the expected behavior

Checklist

  • [x] I have signed the Contributor License Agreement.
  • [x] I have read guidelines for pull request.
  • [x] My code follows the coding guidelines.
  • [x] I have performed a self code review of my own code.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] I have made corresponding changes to the documentation.
  • [x] My changes generate no new warnings.
  • [x] I have added tests that prove my fix is effective or that my feature works.

LINYV0719 avatar Dec 29 '25 13:12 LINYV0719