ColossalAI
ColossalAI copied to clipboard
[autoparallel] support origin activation ckpt on autoprallel system
What does this PR do
Previously, the communication nodes inserted by auto-parallel system will break the origin checkpoint structure, and the graphmodule generated by fx codegen will have a lot of unexpected checkpoint function.
In this PR, the activation annotation will be considered during the node insertion, and use the ColoGraphModule codegen feature to generate a new distributed model with activation checkpoint.
Click me to view the complete report
| Package | Line Rate | Complexity | Health |
|---|---|---|---|
| . | 0% | 0 | ❌ |
| colossalai | 74% | 0 | ❌ |
| colossalai._C | 100% | 0 | ✔ |
| colossalai.amp | 96% | 0 | ✔ |
| colossalai.amp.apex_amp | 92% | 0 | ✔ |
| colossalai.amp.naive_amp | 81% | 0 | ➖ |
| colossalai.amp.naive_amp.grad_scaler | 86% | 0 | ➖ |
| colossalai.amp.torch_amp | 66% | 0 | ❌ |
| colossalai.auto_parallel | 100% | 0 | ✔ |
| colossalai.auto_parallel.checkpoint | 0% | 0 | ❌ |
| colossalai.auto_parallel.meta_profiler | 55% | 0 | ❌ |
| colossalai.auto_parallel.meta_profiler.meta_registry | 33% | 0 | ❌ |
| colossalai.auto_parallel.passes | 7% | 0 | ❌ |
| colossalai.auto_parallel.pipeline_shard | 100% | 0 | ✔ |
| colossalai.auto_parallel.tensor_shard | 57% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.deprecated | 49% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.deprecated.op_handler | 61% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.node_handler | 56% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.node_handler.experimental | 27% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.node_handler.strategy | 50% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.solver | 25% | 0 | ❌ |
| colossalai.auto_parallel.tensor_shard.utils | 49% | 0 | ❌ |
| colossalai.builder | 78% | 0 | ❌ |
| colossalai.cli | 0% | 0 | ❌ |
| colossalai.cli.benchmark | 0% | 0 | ❌ |
| colossalai.cli.check | 0% | 0 | ❌ |
| colossalai.cli.launcher | 0% | 0 | ❌ |
| colossalai.communication | 79% | 0 | ❌ |
| colossalai.context | 95% | 0 | ✔ |
| colossalai.context.process_group_initializer | 99% | 0 | ✔ |
| colossalai.context.random | 91% | 0 | ✔ |
| colossalai.device | 31% | 0 | ❌ |
| colossalai.engine | 85% | 0 | ➖ |
| colossalai.engine.gradient_accumulation | 73% | 0 | ❌ |
| colossalai.engine.gradient_handler | 84% | 0 | ➖ |
| colossalai.engine.schedule | 48% | 0 | ❌ |
| colossalai.fx | 21% | 0 | ❌ |
| colossalai.fx.codegen | 5% | 0 | ❌ |
| colossalai.fx.passes | 53% | 0 | ❌ |
| colossalai.fx.passes.algorithms | 17% | 0 | ❌ |
| colossalai.fx.passes.experimental | 17% | 0 | ❌ |
| colossalai.fx.profiler | 14% | 0 | ❌ |
| colossalai.fx.profiler.experimental | 87% | 0 | ➖ |
| colossalai.fx.profiler.experimental.profiler_function | 63% | 0 | ❌ |
| colossalai.fx.profiler.experimental.profiler_module | 42% | 0 | ❌ |
| colossalai.fx.tracer | 39% | 0 | ❌ |
| colossalai.fx.tracer.bias_addition_patch | 100% | 0 | ✔ |
| colossalai.fx.tracer.bias_addition_patch.patched_bias_addition_function | 62% | 0 | ❌ |
| colossalai.fx.tracer.bias_addition_patch.patched_bias_addition_module | 90% | 0 | ✔ |
| colossalai.fx.tracer.meta_patch | 100% | 0 | ✔ |
| colossalai.fx.tracer.meta_patch.patched_function | 89% | 0 | ➖ |
| colossalai.fx.tracer.meta_patch.patched_module | 95% | 0 | ✔ |
| colossalai.gemini | 85% | 0 | ➖ |
| colossalai.gemini.chunk | 88% | 0 | ➖ |
| colossalai.gemini.memory_tracer | 69% | 0 | ❌ |
| colossalai.gemini.ophooks | 66% | 0 | ❌ |
| colossalai.gemini.paramhooks | 95% | 0 | ✔ |
| colossalai.kernel | 100% | 0 | ✔ |
| colossalai.kernel.cuda_native | 24% | 0 | ❌ |
| colossalai.kernel.jit | 0% | 0 | ❌ |
| colossalai.logging | 72% | 0 | ❌ |
| colossalai.nn | 52% | 0 | ❌ |
| colossalai.nn._ops | 84% | 0 | ➖ |
| colossalai.nn.layer | 67% | 0 | ❌ |
| colossalai.nn.layer.colossalai_layer | 85% | 0 | ➖ |
| colossalai.nn.layer.moe | 69% | 0 | ❌ |
| colossalai.nn.layer.parallel_1d | 69% | 0 | ❌ |
| colossalai.nn.layer.parallel_2d | 65% | 0 | ❌ |
| colossalai.nn.layer.parallel_2p5d | 70% | 0 | ❌ |
| colossalai.nn.layer.parallel_3d | 22% | 0 | ❌ |
| colossalai.nn.layer.parallel_sequence | 38% | 0 | ❌ |
| colossalai.nn.layer.utils | 90% | 0 | ✔ |
| colossalai.nn.layer.vanilla | 60% | 0 | ❌ |
| colossalai.nn.layer.wrapper | 30% | 0 | ❌ |
| colossalai.nn.loss | 74% | 0 | ❌ |
| colossalai.nn.lr_scheduler | 45% | 0 | ❌ |
| colossalai.nn.metric | 54% | 0 | ❌ |
| colossalai.nn.optimizer | 63% | 0 | ❌ |
| colossalai.nn.parallel | 83% | 0 | ➖ |
| colossalai.nn.parallel.layers | 34% | 0 | ❌ |
| colossalai.nn.parallel.layers.cache_embedding | 52% | 0 | ❌ |
| colossalai.pipeline | 48% | 0 | ❌ |
| colossalai.pipeline.middleware | 49% | 0 | ❌ |
| colossalai.pipeline.middleware.adaptor | 93% | 0 | ✔ |
| colossalai.pipeline.rpc | 15% | 0 | ❌ |
| colossalai.registry | 79% | 0 | ❌ |
| colossalai.tensor | 79% | 0 | ❌ |
| colossalai.testing | 90% | 0 | ✔ |
| colossalai.trainer | 68% | 0 | ❌ |
| colossalai.trainer.hooks | 43% | 0 | ❌ |
| colossalai.utils | 58% | 0 | ❌ |
| colossalai.utils.checkpoint | 100% | 0 | ✔ |
| colossalai.utils.checkpoint_io | 95% | 0 | ✔ |
| colossalai.utils.data_sampler | 83% | 0 | ➖ |
| colossalai.utils.model | 82% | 0 | ➖ |
| colossalai.utils.multi_tensor_apply | 78% | 0 | ❌ |
| colossalai.utils.profiler | 0% | 0 | ❌ |
| colossalai.utils.profiler.legacy | 0% | 0 | ❌ |
| colossalai.utils.rank_recorder | 0% | 0 | ❌ |
| colossalai.utils.tensor_detector | 14% | 0 | ❌ |
| colossalai.zero | 95% | 0 | ✔ |
| colossalai.zero.init_ctx | 97% | 0 | ✔ |
| colossalai.zero.shard_utils | 95% | 0 | ✔ |
| colossalai.zero.sharded_model | 60% | 0 | ❌ |
| colossalai.zero.sharded_optim | 83% | 0 | ➖ |
| colossalai.zero.sharded_optim.bookkeeping | 92% | 0 | ✔ |
| colossalai.zero.sharded_param | 97% | 0 | ✔ |
| colossalai.zero.utils | 90% | 0 | ➖ |
| op_builder | 53% | 0 | ❌ |
| tests | 100% | 0 | ✔ |
| tests.components_to_test | 97% | 0 | ✔ |
| tests.components_to_test.utils | 87% | 0 | ➖ |
| tests.test_amp | 97% | 0 | ✔ |
| tests.test_auto_parallel | 100% | 0 | ✔ |
| tests.test_auto_parallel.test_tensor_shard | 39% | 0 | ❌ |
| tests.test_auto_parallel.test_tensor_shard.test_deprecated | 57% | 0 | ❌ |
| tests.test_auto_parallel.test_tensor_shard.test_deprecated.test_deprecated_op_handler | 71% | 0 | ❌ |
| tests.test_auto_parallel.test_tensor_shard.test_gpt | 31% | 0 | ❌ |
| tests.test_auto_parallel.test_tensor_shard.test_metainfo | 44% | 0 | ❌ |
| tests.test_auto_parallel.test_tensor_shard.test_node_handler | 27% | 0 | ❌ |
| tests.test_autochunk | 42% | 0 | ❌ |
| tests.test_autochunk.evoformer | 27% | 0 | ❌ |
| tests.test_comm | 85% | 0 | ➖ |
| tests.test_config | 100% | 0 | ✔ |
| tests.test_context | 77% | 0 | ❌ |
| tests.test_context.configs | 100% | 0 | ✔ |
| tests.test_data | 97% | 0 | ✔ |
| tests.test_data_pipeline_tensor_parallel | 72% | 0 | ❌ |
| tests.test_ddp | 98% | 0 | ✔ |
| tests.test_device | 72% | 0 | ❌ |
| tests.test_engine | 97% | 0 | ✔ |
| tests.test_fx | 96% | 0 | ✔ |
| tests.test_fx.test_ckpt_solvers | 31% | 0 | ❌ |
| tests.test_fx.test_codegen | 29% | 0 | ❌ |
| tests.test_fx.test_meta | 51% | 0 | ❌ |
| tests.test_fx.test_pipeline.test_hf_model | 38% | 0 | ❌ |
| tests.test_fx.test_pipeline.test_timm_model | 38% | 0 | ❌ |
| tests.test_fx.test_pipeline.test_topo | 97% | 0 | ✔ |
| tests.test_fx.test_pipeline.test_torchvision | 41% | 0 | ❌ |
| tests.test_fx.test_profiler | 29% | 0 | ❌ |
| tests.test_fx.test_tracer | 99% | 0 | ✔ |
| tests.test_fx.test_tracer.test_hf_model | 81% | 0 | ➖ |
| tests.test_fx.test_tracer.test_timm_model | 87% | 0 | ➖ |
| tests.test_fx.test_tracer.test_torchaudio_model | 78% | 0 | ❌ |
| tests.test_fx.test_tracer.test_torchrec_model | 93% | 0 | ✔ |
| tests.test_fx.test_tracer.test_torchvision_model | 96% | 0 | ✔ |
| tests.test_gemini | 95% | 0 | ✔ |
| tests.test_gemini.update | 98% | 0 | ✔ |
| tests.test_layers | 53% | 0 | ❌ |
| tests.test_layers.test_1d | 97% | 0 | ✔ |
| tests.test_layers.test_1d.checks_1d | 99% | 0 | ✔ |
| tests.test_layers.test_2d | 91% | 0 | ✔ |
| tests.test_layers.test_2d.checks_2d | 78% | 0 | ❌ |
| tests.test_layers.test_2p5d | 98% | 0 | ✔ |
| tests.test_layers.test_2p5d.checks_2p5d | 100% | 0 | ✔ |
| tests.test_layers.test_3d | 45% | 0 | ❌ |
| tests.test_layers.test_3d.checks_3d | 4% | 0 | ❌ |
| tests.test_layers.test_sequence | 64% | 0 | ❌ |
| tests.test_moe | 97% | 0 | ✔ |
| tests.test_ops | 96% | 0 | ✔ |
| tests.test_optimizer | 96% | 0 | ✔ |
| tests.test_pipeline | 35% | 0 | ❌ |
| tests.test_tensor | 78% | 0 | ❌ |
| tests.test_tensor.common_utils | 75% | 0 | ❌ |
| tests.test_tensor.core | 88% | 0 | ➖ |
| tests.test_tensor.model | 56% | 0 | ❌ |
| tests.test_trainer | 84% | 0 | ➖ |
| tests.test_trainer.test_pipeline | 98% | 0 | ✔ |
| tests.test_utils | 80% | 0 | ❌ |
| tests.test_utils.test_checkpoint | 39% | 0 | ❌ |
| tests.test_utils.test_checkpoint_io | 95% | 0 | ✔ |
| tests.test_zero | 93% | 0 | ✔ |
| tests.test_zero.low_level_zero | 84% | 0 | ➖ |
| Summary | 58% (32941 / 57140) | 0 | ❌ |