determined
determined copied to clipboard
feat: stop using workload sequencer for PyTorchTrial [MLG-184]
Description
summary of calls in existing workload sequencer for reference:
startup callbacks
set data loaders
load from checkpoint
hvd.broadcast parameters/optimizer state
try:
for op in searcher_ops: (workloads)
for batch in op.batches: (workload)
train until min(checkpoint, validation, op complete, scheduling unit)
train:
train loop
train step
broadcast metrics
on_training_workload_end callback
report training metrics
report searcher progress
check for preemption
checkpoint/validate/finish/keep going
checkpoint:
update state last checkpoint
save checkpoint
only on chief, save and broadcast uuid
call checkpoint_upload_end callbacks
check for preemption
validate:
validation loop
report searcher progress/complete
report validation metrics
maybe checkpoint (checkpoint policy)
check for preemption
upload_tb_files
finish:
checkpoint if latest checkpoint isn't latest
validate
except ShouldExit:
checkpoint if not latest
Test Plan
Ensure existing functionality of PyTorchTrial training:
- Training/validation steps
- Save
- Resume training
Commentary (optional)
Checklist
- [ ] Changes have been manually QA'd
- [ ] User-facing API changes need the "User-facing API Change" label.
- [ ] Release notes should be added as a separate file under
docs/release-notes/
. See Release Note for details. - [ ] Licenses should be included for new code which was copied and/or modified from any external code.
- [ ] If modifying
/webui/react/src/shared/
verifymake -C webui/react test-shared
passes.
Deploy Preview for storybook-det canceled.
Name | Link |
---|---|
Latest commit | df4399e3418c72857fe3bbff52e864e1cd140b4c |
Latest deploy log | https://app.netlify.com/sites/storybook-det/deploys/637447eea6b24d0008b47e37 |
Deploy Preview for determined-ui canceled.
Name | Link |
---|---|
Latest commit | df4399e3418c72857fe3bbff52e864e1cd140b4c |
Latest deploy log | https://app.netlify.com/sites/determined-ui/deploys/637447ee53dca30008b13a39 |