ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[autoparallel] support origin activation ckpt on autoprallel system

Open YuliangLiu0306 opened this issue 2 years ago • 1 comments

What does this PR do

Previously, the communication nodes inserted by auto-parallel system will break the origin checkpoint structure, and the graphmodule generated by fx codegen will have a lot of unexpected checkpoint function.

In this PR, the activation annotation will be considered during the node insertion, and use the ColoGraphModule codegen feature to generate a new distributed model with activation checkpoint.

YuliangLiu0306 avatar Jan 12 '23 11:01 YuliangLiu0306

Code Coverage

Click me to view the complete report
Package Line Rate Complexity Health
. 0% 0
colossalai 74% 0
colossalai._C 100% 0
colossalai.amp 96% 0
colossalai.amp.apex_amp 92% 0
colossalai.amp.naive_amp 81% 0
colossalai.amp.naive_amp.grad_scaler 86% 0
colossalai.amp.torch_amp 66% 0
colossalai.auto_parallel 100% 0
colossalai.auto_parallel.checkpoint 0% 0
colossalai.auto_parallel.meta_profiler 55% 0
colossalai.auto_parallel.meta_profiler.meta_registry 33% 0
colossalai.auto_parallel.passes 7% 0
colossalai.auto_parallel.pipeline_shard 100% 0
colossalai.auto_parallel.tensor_shard 57% 0
colossalai.auto_parallel.tensor_shard.deprecated 49% 0
colossalai.auto_parallel.tensor_shard.deprecated.op_handler 61% 0
colossalai.auto_parallel.tensor_shard.node_handler 56% 0
colossalai.auto_parallel.tensor_shard.node_handler.experimental 27% 0
colossalai.auto_parallel.tensor_shard.node_handler.strategy 50% 0
colossalai.auto_parallel.tensor_shard.solver 25% 0
colossalai.auto_parallel.tensor_shard.utils 49% 0
colossalai.builder 78% 0
colossalai.cli 0% 0
colossalai.cli.benchmark 0% 0
colossalai.cli.check 0% 0
colossalai.cli.launcher 0% 0
colossalai.communication 79% 0
colossalai.context 95% 0
colossalai.context.process_group_initializer 99% 0
colossalai.context.random 91% 0
colossalai.device 31% 0
colossalai.engine 85% 0
colossalai.engine.gradient_accumulation 73% 0
colossalai.engine.gradient_handler 84% 0
colossalai.engine.schedule 48% 0
colossalai.fx 21% 0
colossalai.fx.codegen 5% 0
colossalai.fx.passes 53% 0
colossalai.fx.passes.algorithms 17% 0
colossalai.fx.passes.experimental 17% 0
colossalai.fx.profiler 14% 0
colossalai.fx.profiler.experimental 87% 0
colossalai.fx.profiler.experimental.profiler_function 63% 0
colossalai.fx.profiler.experimental.profiler_module 42% 0
colossalai.fx.tracer 39% 0
colossalai.fx.tracer.bias_addition_patch 100% 0
colossalai.fx.tracer.bias_addition_patch.patched_bias_addition_function 62% 0
colossalai.fx.tracer.bias_addition_patch.patched_bias_addition_module 90% 0
colossalai.fx.tracer.meta_patch 100% 0
colossalai.fx.tracer.meta_patch.patched_function 89% 0
colossalai.fx.tracer.meta_patch.patched_module 95% 0
colossalai.gemini 85% 0
colossalai.gemini.chunk 88% 0
colossalai.gemini.memory_tracer 69% 0
colossalai.gemini.ophooks 66% 0
colossalai.gemini.paramhooks 95% 0
colossalai.kernel 100% 0
colossalai.kernel.cuda_native 24% 0
colossalai.kernel.jit 0% 0
colossalai.logging 72% 0
colossalai.nn 52% 0
colossalai.nn._ops 84% 0
colossalai.nn.layer 67% 0
colossalai.nn.layer.colossalai_layer 85% 0
colossalai.nn.layer.moe 69% 0
colossalai.nn.layer.parallel_1d 69% 0
colossalai.nn.layer.parallel_2d 65% 0
colossalai.nn.layer.parallel_2p5d 70% 0
colossalai.nn.layer.parallel_3d 22% 0
colossalai.nn.layer.parallel_sequence 38% 0
colossalai.nn.layer.utils 90% 0
colossalai.nn.layer.vanilla 60% 0
colossalai.nn.layer.wrapper 30% 0
colossalai.nn.loss 74% 0
colossalai.nn.lr_scheduler 45% 0
colossalai.nn.metric 54% 0
colossalai.nn.optimizer 63% 0
colossalai.nn.parallel 83% 0
colossalai.nn.parallel.layers 34% 0
colossalai.nn.parallel.layers.cache_embedding 52% 0
colossalai.pipeline 48% 0
colossalai.pipeline.middleware 49% 0
colossalai.pipeline.middleware.adaptor 93% 0
colossalai.pipeline.rpc 15% 0
colossalai.registry 79% 0
colossalai.tensor 79% 0
colossalai.testing 90% 0
colossalai.trainer 68% 0
colossalai.trainer.hooks 43% 0
colossalai.utils 58% 0
colossalai.utils.checkpoint 100% 0
colossalai.utils.checkpoint_io 95% 0
colossalai.utils.data_sampler 83% 0
colossalai.utils.model 82% 0
colossalai.utils.multi_tensor_apply 78% 0
colossalai.utils.profiler 0% 0
colossalai.utils.profiler.legacy 0% 0
colossalai.utils.rank_recorder 0% 0
colossalai.utils.tensor_detector 14% 0
colossalai.zero 95% 0
colossalai.zero.init_ctx 97% 0
colossalai.zero.shard_utils 95% 0
colossalai.zero.sharded_model 60% 0
colossalai.zero.sharded_optim 83% 0
colossalai.zero.sharded_optim.bookkeeping 92% 0
colossalai.zero.sharded_param 97% 0
colossalai.zero.utils 90% 0
op_builder 53% 0
tests 100% 0
tests.components_to_test 97% 0
tests.components_to_test.utils 87% 0
tests.test_amp 97% 0
tests.test_auto_parallel 100% 0
tests.test_auto_parallel.test_tensor_shard 39% 0
tests.test_auto_parallel.test_tensor_shard.test_deprecated 57% 0
tests.test_auto_parallel.test_tensor_shard.test_deprecated.test_deprecated_op_handler 71% 0
tests.test_auto_parallel.test_tensor_shard.test_gpt 31% 0
tests.test_auto_parallel.test_tensor_shard.test_metainfo 44% 0
tests.test_auto_parallel.test_tensor_shard.test_node_handler 27% 0
tests.test_autochunk 42% 0
tests.test_autochunk.evoformer 27% 0
tests.test_comm 85% 0
tests.test_config 100% 0
tests.test_context 77% 0
tests.test_context.configs 100% 0
tests.test_data 97% 0
tests.test_data_pipeline_tensor_parallel 72% 0
tests.test_ddp 98% 0
tests.test_device 72% 0
tests.test_engine 97% 0
tests.test_fx 96% 0
tests.test_fx.test_ckpt_solvers 31% 0
tests.test_fx.test_codegen 29% 0
tests.test_fx.test_meta 51% 0
tests.test_fx.test_pipeline.test_hf_model 38% 0
tests.test_fx.test_pipeline.test_timm_model 38% 0
tests.test_fx.test_pipeline.test_topo 97% 0
tests.test_fx.test_pipeline.test_torchvision 41% 0
tests.test_fx.test_profiler 29% 0
tests.test_fx.test_tracer 99% 0
tests.test_fx.test_tracer.test_hf_model 81% 0
tests.test_fx.test_tracer.test_timm_model 87% 0
tests.test_fx.test_tracer.test_torchaudio_model 78% 0
tests.test_fx.test_tracer.test_torchrec_model 93% 0
tests.test_fx.test_tracer.test_torchvision_model 96% 0
tests.test_gemini 95% 0
tests.test_gemini.update 98% 0
tests.test_layers 53% 0
tests.test_layers.test_1d 97% 0
tests.test_layers.test_1d.checks_1d 99% 0
tests.test_layers.test_2d 91% 0
tests.test_layers.test_2d.checks_2d 78% 0
tests.test_layers.test_2p5d 98% 0
tests.test_layers.test_2p5d.checks_2p5d 100% 0
tests.test_layers.test_3d 45% 0
tests.test_layers.test_3d.checks_3d 4% 0
tests.test_layers.test_sequence 64% 0
tests.test_moe 97% 0
tests.test_ops 96% 0
tests.test_optimizer 96% 0
tests.test_pipeline 35% 0
tests.test_tensor 78% 0
tests.test_tensor.common_utils 75% 0
tests.test_tensor.core 88% 0
tests.test_tensor.model 56% 0
tests.test_trainer 84% 0
tests.test_trainer.test_pipeline 98% 0
tests.test_utils 80% 0
tests.test_utils.test_checkpoint 39% 0
tests.test_utils.test_checkpoint_io 95% 0
tests.test_zero 93% 0
tests.test_zero.low_level_zero 84% 0
Summary 58% (32941 / 57140) 0

github-actions[bot] avatar Jan 13 '23 03:01 github-actions[bot]