Oobleck icon indicating copy to clipboard operation
Oobleck copied to clipboard

Implement pipeline merge/node borrow

Open insujang opened this issue 10 months ago • 0 comments

During handling failures, if some pipeline doesn't have enough number of nodes, Oobleck is supposed to borrow nodes from other pipelines or merge pipelines. Previous implementation had a prototype implementation, but during refactoring with colossalai backend it is gone. As a result, when there is no pipeline template for the remaining number of nodes in the pipeline, training terminates with an error in OobleckPlugin._instantiate_pipelines().

insujang avatar Apr 11 '24 15:04 insujang