Oobleck
Oobleck copied to clipboard
Implement pipeline merge/node borrow
During handling failures, if some pipeline doesn't have enough number of nodes, Oobleck is supposed to borrow nodes from other pipelines or merge pipelines.
Previous implementation had a prototype implementation, but during refactoring with colossalai backend it is gone. As a result, when there is no pipeline template for the remaining number of nodes in the pipeline, training terminates with an error in OobleckPlugin._instantiate_pipelines()
.