ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[fx] Use colossalai checkpoint and add offload recognition in codegen

Open Cypher30 opened this issue 3 years ago • 0 comments

As our future automatic parallelization might need to offload the checkpoint input for memory saving, I

  1. Replace the origin torch checkpoint function with colossal.utils.checkpoint, which has the inference for offload option.
  2. Add the feature that let codegen module could recognize the "activation_checkpoint" option of node in the graph that is the start point of checkpoint region.

In the test I use setattr(node, 'activation_offload', True) to manually annotate the node with offload option, currently the tracer cannot trace the offload behavior, I think we could add this feature in the future. I think it is ok to only allow our automatic parallelization strategy (or manually by user if needed) to annotate the node with the attribute (the model to be traced should not have the offload operation) right now.

Cypher30 avatar Aug 11 '22 06:08 Cypher30