oneflow
oneflow copied to clipboard
`nn.graph` compilation takes too long when it is a large module
Summary
- After some modification of model implementation. denoise unet of stable diffusion can be build as a
nn.graph. - It has around 76000 ops.
- Just to build op graph only, it takes 278.26s.
(GRAPH:GraphToRun_0:GraphToRun) building graph Done! Cost time: 278.26s.
(GRAPH:GraphToRun_0:GraphToRun) start building plan.
I20220930 14:49:50.007160 4077018 nn_graph.cpp:329] Graph name: GraphToRun_0 compile time: 1280.28 seconds.
Code to reproduce bug
Please post a minimal example to repro the bug. GitHub Gist or repo is highly recommended.
System Information
- What is your OneFlow installation (pip, source, dockerhub):
- OS:
- OneFlow version (run
python3 -m oneflow --doctor): - Python version:
- CUDA driver version:
- GPU models:
- Other info:
这个分支 feat/reduce_task_build_time_tmp
- 对 plan 的一些地方增加并行,可以试用一下;
- 开了GLOG_v = 1 时,glog 日志里面有分段的时间开销统计;
Log file created at: 2022/10/08 15:43:17
Running on machine: oneflow-23
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20221008 15:43:17.858882 3794269 env_global_objects_scope.cpp:159] Using rpc backend: local
I20221008 15:43:17.888291 3794269 epoll_comm_network.cpp:63] CommNet:Epoll listening on 0.0.0.0:42263
I20221008 15:43:48.611045 3794269 version.cpp:22] OneFlow git version: N/A
I20221008 15:43:48.611132 3794269 cuda_device_manager_factory.cpp:63] CUDA runtime version: 11.2
I20221008 15:43:48.611156 3794269 cuda_device_manager_factory.cpp:72] cuDNN version: 8.1.1
I20221008 15:43:48.611162 3794269 cuda_device_manager_factory.cpp:85] NCCL version: 2.12.10
I20221008 15:45:10.695839 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_0-InsertPinnedIdentityOpPass
I20221008 15:45:10.695926 3794269 time_util.h:82] Graph name: GraphToRun_0 InsertPinnedIdentityOpPass time elapsed: 0 milliseconds
I20221008 15:45:10.695936 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_0-InsertPinnedIdentityOpPass
I20221008 15:45:10.695942 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_1-EliminateDeadNodesPass
I20221008 15:45:23.749951 3794269 time_util.h:82] Graph name: GraphToRun_0 EliminateDeadNodesPass time elapsed: 13053 milliseconds
I20221008 15:45:23.750011 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_1-EliminateDeadNodesPass
I20221008 15:45:23.750023 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_2-NormalizationExponentialAverageAutoTickPass
I20221008 15:45:23.750061 3794269 time_util.h:82] Graph name: GraphToRun_0 NormalizationExponentialAverageAutoTickPass time elapsed: 0 milliseconds
I20221008 15:45:23.750066 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_2-NormalizationExponentialAverageAutoTickPass
I20221008 15:45:23.750072 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_3-AutoMixedPrecision
I20221008 15:45:23.750082 3794269 time_util.h:82] Graph name: GraphToRun_0 AutoMixedPrecision time elapsed: 0 milliseconds
I20221008 15:45:23.750087 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_3-AutoMixedPrecision
I20221008 15:45:23.750092 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_4-PruneAmpWhiteIdentityOpPass
I20221008 15:45:36.180141 3794269 time_util.h:82] Graph name: GraphToRun_0 PruneAmpWhiteIdentityOpPass time elapsed: 12430 milliseconds
I20221008 15:45:36.180207 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_4-PruneAmpWhiteIdentityOpPass
I20221008 15:45:36.180219 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_5-OptimizerPlacementOptimizationPass
I20221008 15:45:36.180234 3794269 time_util.h:82] Graph name: GraphToRun_0 OptimizerPlacementOptimizationPass time elapsed: 0 milliseconds
I20221008 15:45:36.180239 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_5-OptimizerPlacementOptimizationPass
I20221008 15:45:36.180244 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_6-FuseAddToOutputPass
I20221008 15:45:36.180251 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseAddToOutputPass time elapsed: 0 milliseconds
I20221008 15:45:36.180256 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_6-FuseAddToOutputPass
I20221008 15:45:36.180261 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_7-IRRoundTripBeforeAD
I20221008 15:46:18.780427 3794269 time_util.h:82] Graph name: GraphToRun_0 IRRoundTripBeforeAD time elapsed: 42600 milliseconds
I20221008 15:46:18.780495 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_7-IRRoundTripBeforeAD
I20221008 15:46:18.780532 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_8-DynamicLossScaleSchedulePass
I20221008 15:46:18.780555 3794269 time_util.h:82] Graph name: GraphToRun_0 DynamicLossScaleSchedulePass time elapsed: 0 milliseconds
I20221008 15:46:18.780560 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_8-DynamicLossScaleSchedulePass
I20221008 15:46:18.780565 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_9-AutoTrainStep
I20221008 15:46:18.780572 3794269 time_util.h:82] Graph name: GraphToRun_0 AutoTrainStep time elapsed: 0 milliseconds
I20221008 15:46:18.780577 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_9-AutoTrainStep
I20221008 15:46:18.780581 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_10-AutoLearningRate
I20221008 15:46:18.780601 3794269 time_util.h:82] Graph name: GraphToRun_0 AutoLearningRate time elapsed: 0 milliseconds
I20221008 15:46:18.780607 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_10-AutoLearningRate
I20221008 15:46:18.780613 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_11-QuantAwareTraining
I20221008 15:46:18.780622 3794269 time_util.h:82] Graph name: GraphToRun_0 QuantAwareTraining time elapsed: 0 milliseconds
I20221008 15:46:18.780627 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_11-QuantAwareTraining
I20221008 15:46:18.780630 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_12-GenerateOptimizerOpConfs
I20221008 15:46:18.780637 3794269 time_util.h:82] Graph name: GraphToRun_0 GenerateOptimizerOpConfs time elapsed: 0 milliseconds
I20221008 15:46:18.780642 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_12-GenerateOptimizerOpConfs
I20221008 15:46:18.780647 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_13-PrunePinnedIdentityOpPass
I20221008 15:46:18.780665 3794269 time_util.h:82] Graph name: GraphToRun_0 PrunePinnedIdentityOpPass time elapsed: 0 milliseconds
I20221008 15:46:18.780673 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_13-PrunePinnedIdentityOpPass
I20221008 15:46:18.780678 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_14-ReplaceEmbeddingOps
I20221008 15:46:18.780684 3794269 time_util.h:82] Graph name: GraphToRun_0 ReplaceEmbeddingOps time elapsed: 0 milliseconds
I20221008 15:46:18.780689 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_14-ReplaceEmbeddingOps
I20221008 15:46:18.780694 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_15-SequentialOneEmbeddingOpsPass
I20221008 15:46:18.780719 3794269 time_util.h:82] Graph name: GraphToRun_0 SequentialOneEmbeddingOpsPass time elapsed: 0 milliseconds
I20221008 15:46:18.780725 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_15-SequentialOneEmbeddingOpsPass
I20221008 15:46:18.780730 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_16-FuseEmbeddingShuffleInteractionPass
I20221008 15:46:18.780737 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseEmbeddingShuffleInteractionPass time elapsed: 0 milliseconds
I20221008 15:46:18.780742 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_16-FuseEmbeddingShuffleInteractionPass
I20221008 15:46:18.780747 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_17-FuseBCEReduceMeanFwBwPass
I20221008 15:46:18.780753 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseBCEReduceMeanFwBwPass time elapsed: 0 milliseconds
I20221008 15:46:18.780757 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_17-FuseBCEReduceMeanFwBwPass
I20221008 15:46:18.780767 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_18-AddSspVariableProxy
I20221008 15:46:18.780773 3794269 time_util.h:82] Graph name: GraphToRun_0 AddSspVariableProxy time elapsed: 0 milliseconds
I20221008 15:46:18.780777 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_18-AddSspVariableProxy
I20221008 15:46:18.780782 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_19-CheckpointingPass
I20221008 15:46:18.780789 3794269 time_util.h:82] Graph name: GraphToRun_0 CheckpointingPass time elapsed: 0 milliseconds
I20221008 15:46:18.780793 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_19-CheckpointingPass
I20221008 15:46:18.780798 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_20-CudnnFusedNormalizationAddReluPass
I20221008 15:46:32.742362 3794269 time_util.h:82] Graph name: GraphToRun_0 CudnnFusedNormalizationAddReluPass time elapsed: 13961 milliseconds
I20221008 15:46:32.742429 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_20-CudnnFusedNormalizationAddReluPass
I20221008 15:46:32.742442 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_21-PruneCastToStaticShapeOpsPass
I20221008 15:46:32.742460 3794269 time_util.h:82] Graph name: GraphToRun_0 PruneCastToStaticShapeOpsPass time elapsed: 0 milliseconds
I20221008 15:46:32.742465 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_21-PruneCastToStaticShapeOpsPass
I20221008 15:46:32.742468 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_22-IRRoundTrip
^[I20221008 15:47:13.906929 3794269 time_util.h:82] Graph name: GraphToRun_0 IRRoundTrip time elapsed: 41164 milliseconds
I20221008 15:47:13.906996 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_22-IRRoundTrip
I20221008 15:47:13.907007 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_23-FuseAddToOutputPass1
I20221008 15:47:13.907027 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseAddToOutputPass time elapsed: 0 milliseconds
I20221008 15:47:13.907033 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_23-FuseAddToOutputPass1
I20221008 15:47:13.907038 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_24-FuseConsecutiveAddPass
I20221008 15:47:28.068310 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseConsecutiveAddPass time elapsed: 14161 milliseconds
I20221008 15:47:28.068367 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_24-FuseConsecutiveAddPass
I20221008 15:47:28.068377 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_25-IndexedSlicesOptimizerRewritePass
I20221008 15:47:28.068398 3794269 time_util.h:82] Graph name: GraphToRun_0 IndexedSlicesOptimizerRewritePass time elapsed: 0 milliseconds
I20221008 15:47:28.068403 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_25-IndexedSlicesOptimizerRewritePass
I20221008 15:47:28.068408 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_26-SplitSparseSoftmaxCrossEntropyOpPass
I20221008 15:47:43.202697 3794269 time_util.h:82] Graph name: GraphToRun_0 SplitSparseSoftmaxCrossEntropyOpPass time elapsed: 15134 milliseconds
I20221008 15:47:43.202769 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_26-SplitSparseSoftmaxCrossEntropyOpPass
I20221008 15:47:43.202781 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_27-DoParallelCastBeforeWideningTypeCast
I20221008 15:47:56.614285 3794269 time_util.h:82] Graph name: GraphToRun_0 DoParallelCastBeforeWideningTypeCast time elapsed: 13411 milliseconds
I20221008 15:47:56.614357 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_27-DoParallelCastBeforeWideningTypeCast
I20221008 15:47:56.614372 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_28-FuseCastScalePass
I20221008 15:47:56.614382 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseCastScalePass time elapsed: 0 milliseconds
I20221008 15:47:56.614387 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_28-FuseCastScalePass
I20221008 15:47:56.614392 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_29-PruneParallelCastOpsPass
I20221008 15:48:11.940276 3794269 time_util.h:82] Graph name: GraphToRun_0 PruneParallelCastOpsPass time elapsed: 15325 milliseconds
I20221008 15:48:11.940340 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_29-PruneParallelCastOpsPass
I20221008 15:48:11.940353 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_30-FuseUpdateOpsPass
I20221008 15:48:11.940371 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseUpdateOpsPass time elapsed: 0 milliseconds
I20221008 15:48:11.940376 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_30-FuseUpdateOpsPass
I20221008 15:48:11.940380 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_31-FuseModelUpdateCastOpsPass
I20221008 15:48:11.940408 3794269 time_util.h:82] Graph name: GraphToRun_0 FuseModelUpdateCastOpsPass time elapsed: 0 milliseconds
I20221008 15:48:11.940414 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_31-FuseModelUpdateCastOpsPass
I20221008 15:48:11.940419 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_32-MultiTensorModelUpdatePass
I20221008 15:48:11.940428 3794269 time_util.h:82] Graph name: GraphToRun_0 MultiTensorModelUpdatePass time elapsed: 0 milliseconds
I20221008 15:48:11.940433 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_32-MultiTensorModelUpdatePass
I20221008 15:48:11.940438 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_33-FixPipelineStageIdPass
I20221008 15:48:11.940445 3794269 time_util.h:82] Graph name: GraphToRun_0 FixPipelineStageIdPass time elapsed: 0 milliseconds
I20221008 15:48:11.940450 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_33-FixPipelineStageIdPass
I20221008 15:48:11.940454 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_34-PipelineBufferPass
I20221008 15:48:11.940461 3794269 time_util.h:82] Graph name: GraphToRun_0 PipelineBufferPass time elapsed: 0 milliseconds
I20221008 15:48:11.940466 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_34-PipelineBufferPass
I20221008 15:48:11.940471 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_35-AutoParallelPass
I20221008 15:48:11.940479 3794269 time_util.h:82] Graph name: GraphToRun_0 AutoParallelPass time elapsed: 0 milliseconds
I20221008 15:48:11.940485 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_35-AutoParallelPass
I20221008 15:48:11.940490 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_36-DumpVariableInfoPass
I20221008 15:48:11.940500 3794269 time_util.h:82] Graph name: GraphToRun_0 DumpVariableInfoPass time elapsed: 0 milliseconds
I20221008 15:48:11.940505 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_36-DumpVariableInfoPass
I20221008 15:48:11.940511 3794269 job_build_and_infer_ctx.cpp:946] GraphToRun_0 start compiling with pass pass_cnt_37-DumpBlobParallelConfPass
I20221008 15:48:26.041432 3794269 time_util.h:82] Graph name: GraphToRun_0 DumpBlobParallelConfPass time elapsed: 14100 milliseconds
I20221008 15:48:26.041503 3794269 job_build_and_infer_ctx.cpp:962] GraphToRun_0 finish compiling with pass pass_cnt_37-DumpBlobParallelConfPass
I20221008 15:48:26.041517 3794269 time_util.h:82] Graph name: GraphToRun_0 CompilePasses time elapsed: 195345 milliseconds
I20221008 15:48:26.467373 3794269 time_util.h:82] Graph name: GraphToRun_0 CheckJob time elapsed: 425 milliseconds
I20221008 15:48:36.643760 3794269 time_util.h:82] Graph name: GraphToRun_0 RegisterFreeEagerTensorsToVariableOpNames time elapsed: 0 milliseconds
I20221008 15:48:46.785388 3794269 time_util.h:82] Graph name: GraphToRun_0 RegisterNewVariableOpInJobPass time elapsed: 10141 milliseconds
I20221008 15:48:56.796336 3794269 time_util.h:82] Graph name: GraphToRun_0 DeleteOutdatedVariableInVariableTensorMgr time elapsed: 10010 milliseconds
I20221008 15:49:08.239645 3794269 time_util.h:82] Graph name: GraphToRun_0 DumpBlobParallelConfPass time elapsed: 11318 milliseconds
I20221008 15:49:20.262473 3794269 time_util.h:82] Graph name: GraphToRun_0 GroupBoxingByDstParallel time elapsed: 12022 milliseconds
I20221008 15:49:31.422750 3794269 time_util.h:82] Graph name: GraphToRun_0 BoxingWithMiddleNodes time elapsed: 11160 milliseconds
I20221008 15:49:43.810060 3794269 time_util.h:82] Graph name: GraphToRun_0 SetCtrlInOpName4VariableOp time elapsed: 12387 milliseconds
I20221008 15:49:56.610219 3794269 time_util.h:82] Graph name: GraphToRun_0 AutoPrependTick time elapsed: 12800 milliseconds
I20221008 15:50:08.079653 3794269 time_util.h:82] Graph name: GraphToRun_0 AddTickForTimeShape time elapsed: 11469 milliseconds
I20221008 15:50:23.020511 3794269 time_util.h:82] Graph name: GraphToRun_0 MultiClientAutoSourceAndSinkTick time elapsed: 14940 milliseconds
I20221008 15:50:34.570312 3794269 time_util.h:82] Graph name: GraphToRun_0 MultiClientAutoInterfaceCriticalSectionTick time elapsed: 11549 milliseconds
I20221008 15:50:34.572420 3794269 time_util.h:82] Graph name: GraphToRun_0 SystemOpFillJobNamePass time elapsed: 2 milliseconds
I20221008 15:50:46.698297 3794269 time_util.h:82] Graph name: GraphToRun_0 DumpBlobParallelConfPass time elapsed: 12125 milliseconds
I20221008 15:50:58.004686 3794269 time_util.h:82] Graph name: GraphToRun_0 CheckOpGraph time elapsed: 11306 milliseconds
I20221008 15:50:58.004750 3794269 time_util.h:82] Graph name: GraphToRun_0 Complete job time elapsed: 121208 milliseconds
I20221008 15:51:06.218361 3794269 time_util.h:82] Graph name: GraphToRun_0 NewOpGraph time elapsed: 8213 milliseconds
I20221008 15:51:06.218456 3794269 time_util.h:82] Graph name: GraphToRun_0 LogOptimizedJob time elapsed: 0 milliseconds
I20221008 15:51:07.809617 3794269 time_util.h:82] Graph name: GraphToRun_0 NewTaskGraph time elapsed: 1591 milliseconds
I20221008 15:51:08.159065 3794269 time_util.h:82] Graph name: GraphToRun_0 ProduceAllRegstsAndBindEdges time elapsed: 349 milliseconds
I20221008 15:51:08.298898 3794269 time_util.h:82] Graph name: GraphToRun_0 ConsumeAllRegsts time elapsed: 139 milliseconds
I20221008 15:51:08.311038 3794269 time_util.h:82] Graph name: GraphToRun_0 PinConsumedRegst time elapsed: 12 milliseconds
I20221008 15:51:08.803300 3794269 time_util.h:82] Graph name: GraphToRun_0 TaskNode::BuildExecGraph time elapsed: 492 milliseconds
I20221008 15:51:08.927374 3794269 time_util.h:82] Graph name: GraphToRun_0 TaskNode::CreateInferList time elapsed: 124 milliseconds
I20221008 15:51:10.594650 3794269 time_util.h:82] Graph name: GraphToRun_0 TaskNode::InferUserRegst time elapsed: 1667 milliseconds
I20221008 15:51:10.599665 3794269 time_util.h:82] Graph name: GraphToRun_0 TaskNode::InferOtherRegst time elapsed: 5 milliseconds
I20221008 15:51:10.776497 3794269 time_util.h:82] Graph name: GraphToRun_0 RemoveEmptyRegsts time elapsed: 176 milliseconds
I20221008 15:51:12.162194 3794269 time_util.h:82] Graph name: GraphToRun_0 InferTimeShapeIfMeaningful time elapsed: 1385 milliseconds
I20221008 15:51:12.585454 3794269 time_util.h:82] Graph name: GraphToRun_0 MergeChainAndAddOrderingCtrlEdgeInSameChain time elapsed: 423 milliseconds
I20221008 15:51:15.808538 3794269 time_util.h:82] Graph name: GraphToRun_0 EnableInplaceMemSharing time elapsed: 3223 milliseconds
I20221008 15:51:15.990125 3794269 time_util.h:82] Graph name: GraphToRun_0 CheckRegstLbiValid time elapsed: 181 milliseconds
I20221008 15:51:17.276623 3794269 time_util.h:82] Graph name: GraphToRun_0 AddTaskIntoPlan time elapsed: 1286 milliseconds
I20221008 15:51:18.063280 3794269 time_util.h:82] GenMemChainTasksAndRegsts time elapsed: 780 milliseconds
I20221008 15:51:18.164120 3794269 time_util.h:82] GenRegstDescId2RegstDesc time elapsed: 100 milliseconds
I20221008 15:51:18.164168 3794269 time_util.h:82] InitForEachMemChain time elapsed: 0 milliseconds
I20221008 15:51:25.805889 3794269 time_util.h:82] GenRegstAllocFreeTimeLineAndRegstMutualExclusions time elapsed: 7641 milliseconds
I20221008 16:11:47.020560 3794269 time_util.h:82] SelectAlgorithmGenMemBlockOffset4Regsts time elapsed: 1221214 milliseconds
I20221008 16:11:47.161007 3794269 time_util.h:82] ChooseBestOneForEachMemChain time elapsed: 140 milliseconds
I20221008 16:11:47.381507 3794269 time_util.h:82] Graph name: GraphToRun_0 InferMemBlockId4MemReusedRegst time elapsed: 1230104 milliseconds
I20221008 16:11:47.437754 3794269 time_util.h:82] Graph name: GraphToRun_0 SetUniqueMemBlockId4UnreusedMemRegst time elapsed: 56 milliseconds
I20221008 16:11:47.537374 3794269 time_util.h:82] Graph name: GraphToRun_0 Compile plan time elapsed: 1249532 milliseconds
I20221008 16:11:48.594246 3794269 time_util.h:82] Graph name: GraphToRun_0 Generate MemBlock and Chunk time elapsed: 1056 milliseconds
I20221008 16:11:48.694079 3794269 time_util.h:82] Graph name: GraphToRun_0 GenRegisterHint time elapsed: 99 milliseconds
I20221008 16:11:48.696100 3794269 time_util.h:82] Graph name: GraphToRun_0 GenCollectiveBoxingPlan time elapsed: 2 milliseconds
I20221008 16:11:48.743595 3794269 time_util.h:82] Graph name: GraphToRun_0 DumpCtrlRegstInfoToPlan time elapsed: 47 milliseconds
I20221008 16:11:48.918435 3794269 plan_util.cpp:963]
Graph name GraphToRun_0 in Rank: 0, Device: 0 needs to allocate [ 3044.08 MiB ] device memory.
In general, Chunk id: 0 memory is [ 1455.98 MiB ] with mem_block_num = 1
Unreused memory not eager var is [ 0.79104 MiB ] with mem_block_num = 20639
Eager Variable Tensor total memory is [ 1587.31 MiB ] with mem_block_num = 650
I20221008 16:11:49.084581 3794269 time_util.h:82] Graph name: GraphToRun_0 Memory and Plan Log time elapsed: 340 milliseconds
I20221008 16:11:56.763695 3794269 time_util.h:82] Graph name: GraphToRun_0 AddOpAttrtoPlan time elapsed: 7679 milliseconds
I20221008 16:12:01.094430 3794269 time_util.h:82] Graph name: GraphToRun_0 ReleaseTaskGraph time elapsed: 4330 milliseconds
I20221008 16:12:03.018481 3794269 time_util.h:82] Graph name: GraphToRun_0 PopulateOpAttribute time elapsed: 1924 milliseconds
I20221008 16:12:03.018572 3794269 time_util.h:82] Graph name: GraphToRun_0 NewRuntimeBuffers time elapsed: 0 milliseconds
I20221008 16:12:03.022179 3794269 time_util.h:82] Graph name: GraphToRun_0 GetVariableRealBlobAfterSyncPlan time elapsed: 3 milliseconds
I20221008 16:12:03.166872 3794269 time_util.h:82] Graph name: GraphToRun_0 VM::ShrinkAllMem time elapsed: 144 milliseconds
I20221008 16:12:04.048031 3794269 thread_manager.cpp:53] Actor thread: 524355 created.
I20221008 16:12:04.048460 3794269 thread_manager.cpp:53] Actor thread: 524353 created.
I20221008 16:12:04.048853 3794269 thread_manager.cpp:53] Actor thread: 524354 created.
I20221008 16:12:04.049227 3794269 thread_manager.cpp:53] Actor thread: 524356 created.
I20221008 16:12:04.049288 3794269 thread_manager.cpp:53] Actor thread: 524352 created.
I20221008 16:12:04.049327 3794269 thread_manager.cpp:53] Actor thread: 524323 created.
I20221008 16:12:04.049434 3794269 thread_manager.cpp:53] Actor thread: 524324 created.
I20221008 16:12:04.049487 3794269 thread_manager.cpp:53] Actor thread: 524326 created.
I20221008 16:12:04.049583 3794269 thread_manager.cpp:53] Actor thread: 524325 created.
I20221008 16:12:04.049676 3794269 thread_manager.cpp:53] Actor thread: 524328 created.
I20221008 16:12:04.069468 3794269 thread_manager.cpp:53] Actor thread: 1048577 created.
I20221008 16:12:04.069542 3794269 thread_manager.cpp:53] Actor thread: 524321 created.
I20221008 16:12:04.069592 3794269 thread_manager.cpp:53] Actor thread: 524300 created.
I20221008 16:12:04.069633 3794269 thread_manager.cpp:53] Actor thread: 524302 created.
I20221008 16:12:04.069686 3794269 thread_manager.cpp:53] Actor thread: 524332 created.
I20221008 16:12:04.069741 3794269 thread_manager.cpp:53] Actor thread: 524303 created.
I20221008 16:12:04.069869 3794269 thread_manager.cpp:53] Actor thread: 524304 created.
I20221008 16:12:04.069972 3794269 thread_manager.cpp:53] Actor thread: 524301 created.
I20221008 16:12:04.070124 3794269 thread_manager.cpp:53] Actor thread: 524308 created.
I20221008 16:12:04.070259 3794269 thread_manager.cpp:53] Actor thread: 524337 created.
I20221008 16:12:04.070330 3794269 thread_manager.cpp:53] Actor thread: 524316 created.
I20221008 16:12:04.070412 3794269 thread_manager.cpp:53] Actor thread: 524334 created.
I20221008 16:12:04.070582 3794269 thread_manager.cpp:53] Actor thread: 524305 created.
I20221008 16:12:04.070791 3794269 thread_manager.cpp:53] Actor thread: 524313 created.
I20221008 16:12:04.070932 3794269 thread_manager.cpp:53] Actor thread: 524343 created.
I20221008 16:12:04.071079 3794269 thread_manager.cpp:53] Actor thread: 524312 created.
I20221008 16:12:04.071228 3794269 thread_manager.cpp:53] Actor thread: 524289 created.
I20221008 16:12:04.071377 3794269 thread_manager.cpp:53] Actor thread: 524348 created.
I20221008 16:12:04.071509 3794269 thread_manager.cpp:53] Actor thread: 524317 created.
I20221008 16:12:04.071645 3794269 thread_manager.cpp:53] Actor thread: 524288 created.
I20221008 16:12:04.071774 3794269 thread_manager.cpp:53] Actor thread: 524347 created.
I20221008 16:12:04.071902 3794269 thread_manager.cpp:53] Actor thread: 524291 created.
I20221008 16:12:04.072018 3794269 thread_manager.cpp:53] Actor thread: 524350 created.
I20221008 16:12:04.072103 3794269 thread_manager.cpp:53] Actor thread: 524292 created.
I20221008 16:12:04.072157 3794269 thread_manager.cpp:53] Actor thread: 524351 created.
I20221008 16:12:04.072230 3794269 thread_manager.cpp:53] Actor thread: 524293 created.
I20221008 16:12:04.072293 3794269 thread_manager.cpp:53] Actor thread: 524335 created.
I20221008 16:12:04.072358 3794269 thread_manager.cpp:53] Actor thread: 524306 created.
I20221008 16:12:04.072414 3794269 thread_manager.cpp:53] Actor thread: 524294 created.
I20221008 16:12:04.072468 3794269 thread_manager.cpp:53] Actor thread: 524336 created.
I20221008 16:12:04.072525 3794269 thread_manager.cpp:53] Actor thread: 524307 created.
I20221008 16:12:04.072584 3794269 thread_manager.cpp:53] Actor thread: 524309 created.
I20221008 16:12:04.072690 3794269 thread_manager.cpp:53] Actor thread: 524311 created.
I20221008 16:12:04.072780 3794269 thread_manager.cpp:53] Actor thread: 524310 created.
I20221008 16:12:04.072881 3794269 thread_manager.cpp:53] Actor thread: 524338 created.
I20221008 16:12:04.072970 3794269 thread_manager.cpp:53] Actor thread: 524339 created.
I20221008 16:12:04.073032 3794269 thread_manager.cpp:53] Actor thread: 524342 created.
I20221008 16:12:04.073108 3794269 thread_manager.cpp:53] Actor thread: 524333 created.
I20221008 16:12:04.073163 3794269 thread_manager.cpp:53] Actor thread: 524331 created.
I20221008 16:12:04.073227 3794269 thread_manager.cpp:53] Actor thread: 524329 created.
I20221008 16:12:04.073309 3794269 thread_manager.cpp:53] Actor thread: 524330 created.
I20221008 16:12:04.073415 3794269 thread_manager.cpp:53] Actor thread: 524298 created.
I20221008 16:12:04.073479 3794269 thread_manager.cpp:53] Actor thread: 524296 created.
I20221008 16:12:04.073544 3794269 thread_manager.cpp:53] Actor thread: 524295 created.
I20221008 16:12:04.073655 3794269 thread_manager.cpp:53] Actor thread: 524315 created.
I20221008 16:12:04.073715 3794269 thread_manager.cpp:53] Actor thread: 524318 created.
I20221008 16:12:04.073772 3794269 thread_manager.cpp:53] Actor thread: 524299 created.
I20221008 16:12:04.073841 3794269 thread_manager.cpp:53] Actor thread: 524319 created.
I20221008 16:12:04.073900 3794269 thread_manager.cpp:53] Actor thread: 524297 created.
I20221008 16:12:04.073957 3794269 thread_manager.cpp:53] Actor thread: 524314 created.
I20221008 16:12:04.074015 3794269 thread_manager.cpp:53] Actor thread: 524344 created.
I20221008 16:12:04.074071 3794269 thread_manager.cpp:53] Actor thread: 524341 created.
I20221008 16:12:04.074137 3794269 thread_manager.cpp:53] Actor thread: 524345 created.
I20221008 16:12:04.074199 3794269 thread_manager.cpp:53] Actor thread: 524346 created.
I20221008 16:12:04.074323 3794269 thread_manager.cpp:53] Actor thread: 524349 created.
I20221008 16:12:04.074393 3794269 thread_manager.cpp:53] Actor thread: 524290 created.
I20221008 16:12:04.074591 3794269 thread_manager.cpp:53] Actor thread: 524340 created.
I20221008 16:12:04.076576 3794269 thread_manager.cpp:53] Actor thread: 1048576 created.
I20221008 16:12:04.076651 3794269 thread_manager.cpp:53] Actor thread: 524320 created.
I20221008 16:12:04.076714 3794269 thread_manager.cpp:53] Actor thread: 524327 created.
I20221008 16:12:04.076756 3794269 thread_manager.cpp:53] Actor thread: 524322 created.
I20221008 16:12:31.566728 3794269 time_util.h:82] Graph name: GraphToRun_0 Runtime Init time elapsed: 28399 milliseconds
I20221008 16:12:54.983990 3794269 nn_graph.cpp:81] Graph destructor Try to close c nn graph name GraphToRun_0.
I20221008 16:12:54.984050 3794269 nn_graph.cpp:87] Try to close c nn graph name GraphToRun_0.
I20221008 16:13:08.155683 3794269 thread_manager.cpp:67] Actor thread: 524355 finished when the graph is destructed.
I20221008 16:13:08.155972 3794269 thread_manager.cpp:67] Actor thread: 524353 finished when the graph is destructed.
I20221008 16:13:08.156224 3794269 thread_manager.cpp:67] Actor thread: 524356 finished when the graph is destructed.
I20221008 16:13:08.156456 3794269 nn_graph.cpp:91] Finish close c nn graph name GraphToRun_0.
I20221008 16:13:13.244390 3794269 multi_client_session_context.cpp:146] Try to delete multi client session context.
I20221008 16:13:13.244704 3794269 thread_manager.cpp:28] Actor thread: 524322 finished when process exits.
I20221008 16:13:13.263633 3794269 thread_manager.cpp:28] Actor thread: 524327 finished when process exits.
I20221008 16:13:13.283440 3794269 thread_manager.cpp:28] Actor thread: 524320 finished when process exits.
I20221008 16:13:13.319746 3794269 thread_manager.cpp:28] Actor thread: 1048576 finished when process exits.
I20221008 16:13:13.321316 3794269 thread_manager.cpp:28] Actor thread: 524340 finished when process exits.
I20221008 16:13:13.321586 3794269 thread_manager.cpp:28] Actor thread: 524290 finished when process exits.
I20221008 16:13:13.321826 3794269 thread_manager.cpp:28] Actor thread: 524349 finished when process exits.
I20221008 16:13:13.322064 3794269 thread_manager.cpp:28] Actor thread: 524346 finished when process exits.
I20221008 16:13:13.323678 3794269 thread_manager.cpp:28] Actor thread: 524345 finished when process exits.
I20221008 16:13:13.351629 3794269 thread_manager.cpp:28] Actor thread: 524341 finished when process exits.
I20221008 16:13:13.353672 3794269 thread_manager.cpp:28] Actor thread: 524344 finished when process exits.
I20221008 16:13:13.355357 3794269 thread_manager.cpp:28] Actor thread: 524314 finished when process exits.
I20221008 16:13:13.384094 3794269 thread_manager.cpp:28] Actor thread: 524317 finished when process exits.
I20221008 16:13:13.384302 3794269 thread_manager.cpp:28] Actor thread: 524348 finished when process exits.
I20221008 16:13:13.408691 3794269 thread_manager.cpp:28] Actor thread: 524289 finished when process exits.
I20221008 16:13:13.437026 3794269 thread_manager.cpp:28] Actor thread: 524312 finished when process exits.
I20221008 16:13:13.460186 3794269 thread_manager.cpp:28] Actor thread: 524343 finished when process exits.
I20221008 16:13:13.482867 3794269 thread_manager.cpp:28] Actor thread: 524313 finished when process exits.
I20221008 16:13:13.506258 3794269 thread_manager.cpp:28] Actor thread: 524305 finished when process exits.
I20221008 16:13:13.506515 3794269 thread_manager.cpp:28] Actor thread: 524334 finished when process exits.
I20221008 16:13:13.529537 3794269 thread_manager.cpp:28] Actor thread: 524316 finished when process exits.
I20221008 16:13:13.551766 3794269 thread_manager.cpp:28] Actor thread: 524337 finished when process exits.
I20221008 16:13:13.574903 3794269 thread_manager.cpp:28] Actor thread: 524308 finished when process exits.
I20221008 16:13:13.597795 3794269 thread_manager.cpp:28] Actor thread: 524301 finished when process exits.
I20221008 16:13:13.621239 3794269 thread_manager.cpp:28] Actor thread: 524304 finished when process exits.
I20221008 16:13:13.643592 3794269 thread_manager.cpp:28] Actor thread: 524303 finished when process exits.
I20221008 16:13:13.666524 3794269 thread_manager.cpp:28] Actor thread: 524332 finished when process exits.
I20221008 16:13:13.688891 3794269 thread_manager.cpp:28] Actor thread: 524302 finished when process exits.
I20221008 16:13:13.711377 3794269 thread_manager.cpp:28] Actor thread: 524328 finished when process exits.
I20221008 16:13:13.733701 3794269 thread_manager.cpp:28] Actor thread: 524297 finished when process exits.
I20221008 16:13:13.756407 3794269 thread_manager.cpp:28] Actor thread: 524296 finished when process exits.
I20221008 16:13:13.778970 3794269 thread_manager.cpp:28] Actor thread: 524326 finished when process exits.
I20221008 16:13:13.801225 3794269 thread_manager.cpp:28] Actor thread: 524300 finished when process exits.
I20221008 16:13:13.824275 3794269 thread_manager.cpp:28] Actor thread: 524352 finished when process exits.
I20221008 16:13:13.847647 3794269 thread_manager.cpp:28] Actor thread: 524293 finished when process exits.
I20221008 16:13:13.869405 3794269 thread_manager.cpp:28] Actor thread: 524323 finished when process exits.
I20221008 16:13:13.891193 3794269 thread_manager.cpp:28] Actor thread: 524294 finished when process exits.
I20221008 16:13:13.891427 3794269 thread_manager.cpp:28] Actor thread: 524324 finished when process exits.
I20221008 16:13:13.914268 3794269 thread_manager.cpp:28] Actor thread: 524354 finished when process exits.
I20221008 16:13:13.936777 3794269 thread_manager.cpp:28] Actor thread: 524295 finished when process exits.
I20221008 16:13:13.960007 3794269 thread_manager.cpp:28] Actor thread: 524325 finished when process exits.
I20221008 16:13:14.001765 3794269 thread_manager.cpp:28] Actor thread: 1048577 finished when process exits.
I20221008 16:13:14.024989 3794269 thread_manager.cpp:28] Actor thread: 524321 finished when process exits.
I20221008 16:13:14.047371 3794269 thread_manager.cpp:28] Actor thread: 524288 finished when process exits.
I20221008 16:13:14.079447 3794269 thread_manager.cpp:28] Actor thread: 524347 finished when process exits.
I20221008 16:13:14.099679 3794269 thread_manager.cpp:28] Actor thread: 524291 finished when process exits.
I20221008 16:13:14.121587 3794269 thread_manager.cpp:28] Actor thread: 524350 finished when process exits.
I20221008 16:13:14.144137 3794269 thread_manager.cpp:28] Actor thread: 524292 finished when process exits.
I20221008 16:13:14.166930 3794269 thread_manager.cpp:28] Actor thread: 524351 finished when process exits.
I20221008 16:13:14.189461 3794269 thread_manager.cpp:28] Actor thread: 524335 finished when process exits.
I20221008 16:13:14.212334 3794269 thread_manager.cpp:28] Actor thread: 524306 finished when process exits.
I20221008 16:13:14.235337 3794269 thread_manager.cpp:28] Actor thread: 524336 finished when process exits.
I20221008 16:13:14.258574 3794269 thread_manager.cpp:28] Actor thread: 524307 finished when process exits.
I20221008 16:13:14.282321 3794269 thread_manager.cpp:28] Actor thread: 524309 finished when process exits.
I20221008 16:13:14.306492 3794269 thread_manager.cpp:28] Actor thread: 524311 finished when process exits.
I20221008 16:13:14.329731 3794269 thread_manager.cpp:28] Actor thread: 524310 finished when process exits.
I20221008 16:13:14.352705 3794269 thread_manager.cpp:28] Actor thread: 524338 finished when process exits.
I20221008 16:13:14.375656 3794269 thread_manager.cpp:28] Actor thread: 524339 finished when process exits.
I20221008 16:13:14.398633 3794269 thread_manager.cpp:28] Actor thread: 524342 finished when process exits.
I20221008 16:13:14.421838 3794269 thread_manager.cpp:28] Actor thread: 524333 finished when process exits.
I20221008 16:13:14.444658 3794269 thread_manager.cpp:28] Actor thread: 524331 finished when process exits.
I20221008 16:13:14.467628 3794269 thread_manager.cpp:28] Actor thread: 524329 finished when process exits.
I20221008 16:13:14.490553 3794269 thread_manager.cpp:28] Actor thread: 524330 finished when process exits.
I20221008 16:13:14.514017 3794269 thread_manager.cpp:28] Actor thread: 524298 finished when process exits.
I20221008 16:13:14.537887 3794269 thread_manager.cpp:28] Actor thread: 524315 finished when process exits.
I20221008 16:13:14.560840 3794269 thread_manager.cpp:28] Actor thread: 524318 finished when process exits.
I20221008 16:13:14.584080 3794269 thread_manager.cpp:28] Actor thread: 524299 finished when process exits.
I20221008 16:13:14.607300 3794269 thread_manager.cpp:28] Actor thread: 524319 finished when process exits.
I20221008 16:13:21.786613 3794269 multi_client_session_context.cpp:172] Finish delete multi client session context.
I20221008 16:13:22.088157 3794269 epoll_comm_network.cpp:89] CommNet Thread 0 finish
I20221008 16:13:22.088418 3794269 epoll_comm_network.cpp:89] CommNet Thread 1 finish
I20221008 16:13:22.088523 3794269 epoll_comm_network.cpp:89] CommNet Thread 2 finish
I20221008 16:13:22.088591 3794269 epoll_comm_network.cpp:89] CommNet Thread 3 finish
I20221008 15:51:18.164168 3794269 time_util.h:82] InitForEachMemChain time elapsed: 0 milliseconds
I20221008 15:51:25.805889 3794269 time_util.h:82] GenRegstAllocFreeTimeLineAndRegstMutualExclusions time elapsed: 7641 milliseconds
I20221008 16:11:47.020560 3794269 time_util.h:82] SelectAlgorithmGenMemBlockOffset4Regsts time elapsed: 1221214 milliseconds
I20221008 16:11:47.161007 3794269 time_util.h:82] ChooseBestOneForEachMemChain time elapsed: 140 milliseconds
I20221008 16:11:47.381507 3794269 time_util.h:82] Graph name: GraphToRun_0 InferMemBlockId4MemReusedRegst time elapsed: 1230104 milliseconds
这里开销最大,SelectAlgorithmGenMemBlockOffset4Regsts 1221s (20分钟)
一共有多少个 op 呢? 这里的算法复杂度是 nlogn 的,已经是比较可以接受的速度了。另外也做了多线程。可以观察 20 分钟内的 cpu 使用率吗?
最近加的内存压缩算法也没有优势的,那个从0开始迭代会更慢。 这里有一个简单的优化是,只跑按内存排序的那个算法。 另外问一下,内存压缩算法已经合进去了,这个结果包含了内存压缩的运行时间吗?
试下merge一下这个branch呢? feat-speed_up-mem_reuse
feat-speed_up-mem_reuse 分支
Log file created at: 2022/10/09 21:01:29
Running on machine: oneflow-23
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20221009 21:01:29.014497 2343139 env_global_objects_scope.cpp:159] Using rpc backend: local
I20221009 21:01:29.081584 2343139 epoll_comm_network.cpp:63] CommNet:Epoll listening on 0.0.0.0:44137
I20221009 21:02:02.156193 2343139 version.cpp:22] OneFlow git version: N/A
I20221009 21:02:02.156265 2343139 cuda_device_manager_factory.cpp:63] CUDA runtime version: 11.2
I20221009 21:02:02.156282 2343139 cuda_device_manager_factory.cpp:72] cuDNN version: 8.1.1
I20221009 21:02:02.156289 2343139 cuda_device_manager_factory.cpp:85] NCCL version: 2.12.10
I20221009 21:03:33.203392 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_0-InsertPinnedIdentityOpPass
I20221009 21:03:33.203474 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_0-InsertPinnedIdentityOpPass
I20221009 21:03:33.203485 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_1-EliminateDeadNodesPass
I20221009 21:03:45.834390 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_1-EliminateDeadNodesPass
I20221009 21:03:45.834470 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_2-NormalizationExponentialAverageAutoTickPass
I20221009 21:03:45.834491 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_2-NormalizationExponentialAverageAutoTickPass
I20221009 21:03:45.834498 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_3-AutoMixedPrecision
I20221009 21:03:45.834507 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_3-AutoMixedPrecision
I20221009 21:03:45.834512 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_4-PruneAmpWhiteIdentityOpPass
I20221009 21:03:58.219841 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_4-PruneAmpWhiteIdentityOpPass
I20221009 21:03:58.219921 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_5-OptimizerPlacementOptimizationPass
I20221009 21:03:58.219933 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_5-OptimizerPlacementOptimizationPass
I20221009 21:03:58.219939 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_6-FuseAddToOutputPass
I20221009 21:03:58.219945 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_6-FuseAddToOutputPass
I20221009 21:03:58.219950 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_7-IRRoundTripBeforeAD
I20221009 21:04:39.700385 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_7-IRRoundTripBeforeAD
I20221009 21:04:39.700472 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_8-DynamicLossScaleSchedulePass
I20221009 21:04:39.700493 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_8-DynamicLossScaleSchedulePass
I20221009 21:04:39.700500 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_9-AutoTrainStep
I20221009 21:04:39.700507 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_9-AutoTrainStep
I20221009 21:04:39.700512 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_10-AutoLearningRate
I20221009 21:04:39.700520 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_10-AutoLearningRate
I20221009 21:04:39.700525 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_11-QuantAwareTraining
I20221009 21:04:39.700532 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_11-QuantAwareTraining
I20221009 21:04:39.700549 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_12-GenerateOptimizerOpConfs
I20221009 21:04:39.700556 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_12-GenerateOptimizerOpConfs
I20221009 21:04:39.700562 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_13-PrunePinnedIdentityOpPass
I20221009 21:04:39.700570 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_13-PrunePinnedIdentityOpPass
I20221009 21:04:39.700577 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_14-ReplaceEmbeddingOps
I20221009 21:04:39.700582 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_14-ReplaceEmbeddingOps
I20221009 21:04:39.700588 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_15-SequentialOneEmbeddingOpsPass
I20221009 21:04:39.700598 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_15-SequentialOneEmbeddingOpsPass
I20221009 21:04:39.700604 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_16-FuseEmbeddingShuffleInteractionPass
I20221009 21:04:39.700611 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_16-FuseEmbeddingShuffleInteractionPass
I20221009 21:04:39.700618 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_17-FuseBCEReduceMeanFwBwPass
I20221009 21:04:39.700623 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_17-FuseBCEReduceMeanFwBwPass
I20221009 21:04:39.700630 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_18-AddSspVariableProxy
I20221009 21:04:39.700636 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_18-AddSspVariableProxy
I20221009 21:04:39.700642 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_19-CheckpointingPass
I20221009 21:04:39.700649 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_19-CheckpointingPass
I20221009 21:04:39.700655 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_20-CudnnFusedNormalizationAddReluPass
I20221009 21:04:53.579181 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_20-CudnnFusedNormalizationAddReluPass
I20221009 21:04:53.579264 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_21-PruneCastToStaticShapeOpsPass
I20221009 21:04:53.579284 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_21-PruneCastToStaticShapeOpsPass
I20221009 21:04:53.579289 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_22-IRRoundTrip
I20221009 21:05:34.688592 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_22-IRRoundTrip
I20221009 21:05:34.688678 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_23-FuseAddToOutputPass1
I20221009 21:05:34.688697 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_23-FuseAddToOutputPass1
I20221009 21:05:34.688704 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_24-FuseConsecutiveAddPass
I20221009 21:05:48.461259 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_24-FuseConsecutiveAddPass
I20221009 21:05:48.461335 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_25-IndexedSlicesOptimizerRewritePass
I20221009 21:05:48.461355 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_25-IndexedSlicesOptimizerRewritePass
I20221009 21:05:48.461360 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_26-SplitSparseSoftmaxCrossEntropyOpPass
I20221009 21:06:03.416482 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_26-SplitSparseSoftmaxCrossEntropyOpPass
I20221009 21:06:03.416569 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_27-DoParallelCastBeforeWideningTypeCast
I20221009 21:06:16.610769 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_27-DoParallelCastBeforeWideningTypeCast
I20221009 21:06:16.610846 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_28-FuseCastScalePass
I20221009 21:06:16.610857 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_28-FuseCastScalePass
I20221009 21:06:16.610864 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_29-PruneParallelCastOpsPass
I20221009 21:06:31.640833 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_29-PruneParallelCastOpsPass
I20221009 21:06:31.640915 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_30-FuseUpdateOpsPass
I20221009 21:06:31.640934 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_30-FuseUpdateOpsPass
I20221009 21:06:31.640940 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_31-FuseModelUpdateCastOpsPass
I20221009 21:06:31.640969 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_31-FuseModelUpdateCastOpsPass
I20221009 21:06:31.640975 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_32-MultiTensorModelUpdatePass
I20221009 21:06:31.640982 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_32-MultiTensorModelUpdatePass
I20221009 21:06:31.640990 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_33-FixPipelineStageIdPass
I20221009 21:06:31.640997 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_33-FixPipelineStageIdPass
I20221009 21:06:31.641003 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_34-PipelineBufferPass
I20221009 21:06:31.641011 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_34-PipelineBufferPass
I20221009 21:06:31.641017 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_35-AutoParallelPass
I20221009 21:06:31.641026 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_35-AutoParallelPass
I20221009 21:06:31.641033 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_36-DumpVariableInfoPass
I20221009 21:06:31.641044 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_36-DumpVariableInfoPass
I20221009 21:06:31.641050 2343139 job_build_and_infer_ctx.cpp:942] GraphToRun_0 start compiling with pass pass_cnt_37-DumpBlobParallelConfPass
I20221009 21:06:45.282546 2343139 job_build_and_infer_ctx.cpp:956] GraphToRun_0 finish compiling with pass pass_cnt_37-DumpBlobParallelConfPass
I20221009 21:16:09.392753 2343139 nn_graph.cpp:329] Graph name: GraphToRun_0 compile time: 412.322 seconds.
I20221009 21:16:09.802516 2343139 plan_util.cpp:965]
Graph name GraphToRun_0 in Rank: 0, Device: 0 needs to allocate [ 3044.08 MiB ] device memory.
In general, Chunk id: 0 memory is [ 1455.98 MiB ] with mem_block_num = 1
Unreused memory not eager var is [ 0.79104 MiB ] with mem_block_num = 20570
Eager Variable Tensor total memory is [ 1587.31 MiB ] with mem_block_num = 650
I20221009 21:16:12.059098 2343139 thread_manager.cpp:53] Actor thread: 524354 created.
I20221009 21:16:12.059180 2343139 thread_manager.cpp:53] Actor thread: 524356 created.
I20221009 21:16:12.059247 2343139 thread_manager.cpp:53] Actor thread: 524355 created.
I20221009 21:16:12.059283 2343139 thread_manager.cpp:53] Actor thread: 524352 created.
I20221009 21:16:12.059365 2343139 thread_manager.cpp:53] Actor thread: 524353 created.
I20221009 21:16:12.059435 2343139 thread_manager.cpp:53] Actor thread: 524311 created.
I20221009 21:16:12.059494 2343139 thread_manager.cpp:53] Actor thread: 524344 created.
I20221009 21:16:12.059577 2343139 thread_manager.cpp:53] Actor thread: 524297 created.
I20221009 21:16:12.059634 2343139 thread_manager.cpp:53] Actor thread: 524345 created.
I20221009 21:16:12.059705 2343139 thread_manager.cpp:53] Actor thread: 524349 created.
I20221009 21:16:12.059819 2343139 thread_manager.cpp:53] Actor thread: 524347 created.
I20221009 21:16:12.068763 2343139 thread_manager.cpp:53] Actor thread: 1048577 created.
I20221009 21:16:12.068847 2343139 thread_manager.cpp:53] Actor thread: 524321 created.
I20221009 21:16:12.068892 2343139 thread_manager.cpp:53] Actor thread: 524305 created.
I20221009 21:16:12.068945 2343139 thread_manager.cpp:53] Actor thread: 524306 created.
I20221009 21:16:12.068990 2343139 thread_manager.cpp:53] Actor thread: 524335 created.
I20221009 21:16:12.069082 2343139 thread_manager.cpp:53] Actor thread: 524299 created.
I20221009 21:16:12.069140 2343139 thread_manager.cpp:53] Actor thread: 524333 created.
I20221009 21:16:12.069226 2343139 thread_manager.cpp:53] Actor thread: 524340 created.
I20221009 21:16:12.069296 2343139 thread_manager.cpp:53] Actor thread: 524310 created.
I20221009 21:16:12.069404 2343139 thread_manager.cpp:53] Actor thread: 524323 created.
I20221009 21:16:12.069468 2343139 thread_manager.cpp:53] Actor thread: 524294 created.
I20221009 21:16:12.069515 2343139 thread_manager.cpp:53] Actor thread: 524312 created.
I20221009 21:16:12.069562 2343139 thread_manager.cpp:53] Actor thread: 524314 created.
I20221009 21:16:12.069633 2343139 thread_manager.cpp:53] Actor thread: 524302 created.
I20221009 21:16:12.069697 2343139 thread_manager.cpp:53] Actor thread: 524318 created.
I20221009 21:16:12.069748 2343139 thread_manager.cpp:53] Actor thread: 524289 created.
I20221009 21:16:12.069852 2343139 thread_manager.cpp:53] Actor thread: 524348 created.
I20221009 21:16:12.069911 2343139 thread_manager.cpp:53] Actor thread: 524301 created.
I20221009 21:16:12.069991 2343139 thread_manager.cpp:53] Actor thread: 524300 created.
I20221009 21:16:12.070050 2343139 thread_manager.cpp:53] Actor thread: 524288 created.
I20221009 21:16:12.070117 2343139 thread_manager.cpp:53] Actor thread: 524346 created.
I20221009 21:16:12.070184 2343139 thread_manager.cpp:53] Actor thread: 524307 created.
I20221009 21:16:12.070261 2343139 thread_manager.cpp:53] Actor thread: 524337 created.
I20221009 21:16:12.070334 2343139 thread_manager.cpp:53] Actor thread: 524308 created.
I20221009 21:16:12.070384 2343139 thread_manager.cpp:53] Actor thread: 524327 created.
I20221009 21:16:12.070447 2343139 thread_manager.cpp:53] Actor thread: 524298 created.
I20221009 21:16:12.070494 2343139 thread_manager.cpp:53] Actor thread: 524324 created.
I20221009 21:16:12.070559 2343139 thread_manager.cpp:53] Actor thread: 524332 created.
I20221009 21:16:12.070612 2343139 thread_manager.cpp:53] Actor thread: 524322 created.
I20221009 21:16:12.070673 2343139 thread_manager.cpp:53] Actor thread: 524338 created.
I20221009 21:16:12.070732 2343139 thread_manager.cpp:53] Actor thread: 524309 created.
I20221009 21:16:12.070824 2343139 thread_manager.cpp:53] Actor thread: 524319 created.
I20221009 21:16:12.070914 2343139 thread_manager.cpp:53] Actor thread: 524331 created.
I20221009 21:16:12.072542 2343139 thread_manager.cpp:53] Actor thread: 1048576 created.
I20221009 21:16:12.072608 2343139 thread_manager.cpp:53] Actor thread: 524320 created.
I20221009 21:16:12.072649 2343139 thread_manager.cpp:53] Actor thread: 524329 created.
I20221009 21:16:12.072692 2343139 thread_manager.cpp:53] Actor thread: 524325 created.
I20221009 21:16:12.072751 2343139 thread_manager.cpp:53] Actor thread: 524351 created.
I20221009 21:16:12.072819 2343139 thread_manager.cpp:53] Actor thread: 524292 created.
I20221009 21:16:12.072880 2343139 thread_manager.cpp:53] Actor thread: 524334 created.
I20221009 21:16:12.072976 2343139 thread_manager.cpp:53] Actor thread: 524291 created.
I20221009 21:16:12.073060 2343139 thread_manager.cpp:53] Actor thread: 524350 created.
I20221009 21:16:12.073120 2343139 thread_manager.cpp:53] Actor thread: 524328 created.
I20221009 21:16:12.073175 2343139 thread_manager.cpp:53] Actor thread: 524290 created.
I20221009 21:16:12.073261 2343139 thread_manager.cpp:53] Actor thread: 524330 created.
I20221009 21:16:12.073323 2343139 thread_manager.cpp:53] Actor thread: 524303 created.
I20221009 21:16:12.073392 2343139 thread_manager.cpp:53] Actor thread: 524341 created.
I20221009 21:16:12.073460 2343139 thread_manager.cpp:53] Actor thread: 524296 created.
I20221009 21:16:12.073530 2343139 thread_manager.cpp:53] Actor thread: 524304 created.
I20221009 21:16:12.073590 2343139 thread_manager.cpp:53] Actor thread: 524293 created.
I20221009 21:16:12.073654 2343139 thread_manager.cpp:53] Actor thread: 524313 created.
I20221009 21:16:12.073704 2343139 thread_manager.cpp:53] Actor thread: 524316 created.
I20221009 21:16:12.073776 2343139 thread_manager.cpp:53] Actor thread: 524317 created.
I20221009 21:16:12.073855 2343139 thread_manager.cpp:53] Actor thread: 524315 created.
I20221009 21:16:12.074016 2343139 thread_manager.cpp:53] Actor thread: 524343 created.
I20221009 21:16:12.074095 2343139 thread_manager.cpp:53] Actor thread: 524342 created.
I20221009 21:16:12.074231 2343139 thread_manager.cpp:53] Actor thread: 524326 created.
I20221009 21:16:12.074891 2343139 thread_manager.cpp:53] Actor thread: 524336 created.
I20221009 21:16:12.075055 2343139 thread_manager.cpp:53] Actor thread: 524339 created.
I20221009 21:16:12.075217 2343139 thread_manager.cpp:53] Actor thread: 524295 created.
I20221009 21:16:58.847649 2343139 nn_graph.cpp:76] Graph destructor Try to close c nn graph name GraphToRun_0.
I20221009 21:16:58.847715 2343139 nn_graph.cpp:82] Try to close c nn graph name GraphToRun_0.
I20221009 21:17:06.231520 2343139 thread_manager.cpp:67] Actor thread: 524356 finished when the graph is destructed.
I20221009 21:17:06.231763 2343139 thread_manager.cpp:67] Actor thread: 524355 finished when the graph is destructed.
I20221009 21:17:06.231966 2343139 thread_manager.cpp:67] Actor thread: 524353 finished when the graph is destructed.
I20221009 21:17:06.232045 2343139 nn_graph.cpp:86] Finish close c nn graph name GraphToRun_0.
I20221009 21:17:11.145151 2343139 multi_client_session_context.cpp:146] Try to delete multi client session context.
I20221009 21:17:11.145682 2343139 thread_manager.cpp:28] Actor thread: 524295 finished when process exits.
I20221009 21:17:11.189685 2343139 thread_manager.cpp:28] Actor thread: 524339 finished when process exits.
I20221009 21:17:11.207367 2343139 thread_manager.cpp:28] Actor thread: 524336 finished when process exits.
I20221009 21:17:11.208568 2343139 thread_manager.cpp:28] Actor thread: 524326 finished when process exits.
I20221009 21:17:11.208887 2343139 thread_manager.cpp:28] Actor thread: 524342 finished when process exits.
I20221009 21:17:11.210000 2343139 thread_manager.cpp:28] Actor thread: 524343 finished when process exits.
I20221009 21:17:11.211112 2343139 thread_manager.cpp:28] Actor thread: 524315 finished when process exits.
I20221009 21:17:11.212196 2343139 thread_manager.cpp:28] Actor thread: 524317 finished when process exits.
I20221009 21:17:11.213570 2343139 thread_manager.cpp:28] Actor thread: 524316 finished when process exits.
I20221009 21:17:11.214954 2343139 thread_manager.cpp:28] Actor thread: 524313 finished when process exits.
I20221009 21:17:11.216265 2343139 thread_manager.cpp:28] Actor thread: 524293 finished when process exits.
I20221009 21:17:11.217376 2343139 thread_manager.cpp:28] Actor thread: 524304 finished when process exits.
I20221009 21:17:11.219115 2343139 thread_manager.cpp:28] Actor thread: 524301 finished when process exits.
I20221009 21:17:11.237397 2343139 thread_manager.cpp:28] Actor thread: 524348 finished when process exits.
I20221009 21:17:11.256284 2343139 thread_manager.cpp:28] Actor thread: 524289 finished when process exits.
I20221009 21:17:11.256831 2343139 thread_manager.cpp:28] Actor thread: 524302 finished when process exits.
I20221009 21:17:11.275347 2343139 thread_manager.cpp:28] Actor thread: 524314 finished when process exits.
I20221009 21:17:11.293922 2343139 thread_manager.cpp:28] Actor thread: 524312 finished when process exits.
I20221009 21:17:11.315598 2343139 thread_manager.cpp:28] Actor thread: 524310 finished when process exits.
I20221009 21:17:11.334412 2343139 thread_manager.cpp:28] Actor thread: 524333 finished when process exits.
I20221009 21:17:11.353454 2343139 thread_manager.cpp:28] Actor thread: 524299 finished when process exits.
I20221009 21:17:11.372211 2343139 thread_manager.cpp:28] Actor thread: 524335 finished when process exits.
I20221009 21:17:11.390949 2343139 thread_manager.cpp:28] Actor thread: 524306 finished when process exits.
I20221009 21:17:11.409577 2343139 thread_manager.cpp:28] Actor thread: 524305 finished when process exits.
I20221009 21:17:11.428203 2343139 thread_manager.cpp:28] Actor thread: 524354 finished when process exits.
I20221009 21:17:11.446946 2343139 thread_manager.cpp:28] Actor thread: 524297 finished when process exits.
I20221009 21:17:11.465775 2343139 thread_manager.cpp:28] Actor thread: 524294 finished when process exits.
I20221009 21:17:11.484766 2343139 thread_manager.cpp:28] Actor thread: 524323 finished when process exits.
I20221009 21:17:11.501847 2343139 thread_manager.cpp:28] Actor thread: 524352 finished when process exits.
I20221009 21:17:11.520665 2343139 thread_manager.cpp:28] Actor thread: 524340 finished when process exits.
I20221009 21:17:11.539834 2343139 thread_manager.cpp:28] Actor thread: 524311 finished when process exits.
I20221009 21:17:11.558683 2343139 thread_manager.cpp:28] Actor thread: 524303 finished when process exits.
I20221009 21:17:11.575945 2343139 thread_manager.cpp:28] Actor thread: 524344 finished when process exits.
I20221009 21:17:11.594913 2343139 thread_manager.cpp:28] Actor thread: 524296 finished when process exits.
I20221009 21:17:11.613610 2343139 thread_manager.cpp:28] Actor thread: 524349 finished when process exits.
I20221009 21:17:11.632246 2343139 thread_manager.cpp:28] Actor thread: 524290 finished when process exits.
I20221009 21:17:11.651294 2343139 thread_manager.cpp:28] Actor thread: 524345 finished when process exits.
I20221009 21:17:11.670347 2343139 thread_manager.cpp:28] Actor thread: 524318 finished when process exits.
I20221009 21:17:11.689515 2343139 thread_manager.cpp:28] Actor thread: 524347 finished when process exits.
I20221009 21:17:11.708652 2343139 thread_manager.cpp:28] Actor thread: 524288 finished when process exits.
I20221009 21:17:11.747866 2343139 thread_manager.cpp:28] Actor thread: 1048577 finished when process exits.
I20221009 21:17:11.766500 2343139 thread_manager.cpp:28] Actor thread: 524321 finished when process exits.
I20221009 21:17:11.785259 2343139 thread_manager.cpp:28] Actor thread: 524300 finished when process exits.
I20221009 21:17:11.816280 2343139 thread_manager.cpp:28] Actor thread: 524346 finished when process exits.
I20221009 21:17:11.833281 2343139 thread_manager.cpp:28] Actor thread: 524307 finished when process exits.
I20221009 21:17:11.852285 2343139 thread_manager.cpp:28] Actor thread: 524337 finished when process exits.
I20221009 21:17:11.871312 2343139 thread_manager.cpp:28] Actor thread: 524308 finished when process exits.
I20221009 21:17:11.890344 2343139 thread_manager.cpp:28] Actor thread: 524327 finished when process exits.
I20221009 21:17:11.909389 2343139 thread_manager.cpp:28] Actor thread: 524298 finished when process exits.
I20221009 21:17:11.928386 2343139 thread_manager.cpp:28] Actor thread: 524324 finished when process exits.
I20221009 21:17:11.947249 2343139 thread_manager.cpp:28] Actor thread: 524332 finished when process exits.
I20221009 21:17:11.965997 2343139 thread_manager.cpp:28] Actor thread: 524322 finished when process exits.
I20221009 21:17:11.985054 2343139 thread_manager.cpp:28] Actor thread: 524338 finished when process exits.
I20221009 21:17:12.003787 2343139 thread_manager.cpp:28] Actor thread: 524309 finished when process exits.
I20221009 21:17:12.022555 2343139 thread_manager.cpp:28] Actor thread: 524319 finished when process exits.
I20221009 21:17:12.041574 2343139 thread_manager.cpp:28] Actor thread: 524331 finished when process exits.
I20221009 21:17:17.492251 2343139 thread_manager.cpp:28] Actor thread: 1048576 finished when process exits.
I20221009 21:17:17.510497 2343139 thread_manager.cpp:28] Actor thread: 524320 finished when process exits.
I20221009 21:17:17.529810 2343139 thread_manager.cpp:28] Actor thread: 524329 finished when process exits.
I20221009 21:17:17.530349 2343139 thread_manager.cpp:28] Actor thread: 524325 finished when process exits.
I20221009 21:17:17.530630 2343139 thread_manager.cpp:28] Actor thread: 524351 finished when process exits.
I20221009 21:17:17.549371 2343139 thread_manager.cpp:28] Actor thread: 524292 finished when process exits.
I20221009 21:17:17.568557 2343139 thread_manager.cpp:28] Actor thread: 524334 finished when process exits.
I20221009 21:17:17.593268 2343139 thread_manager.cpp:28] Actor thread: 524291 finished when process exits.
I20221009 21:17:17.593894 2343139 thread_manager.cpp:28] Actor thread: 524350 finished when process exits.
I20221009 21:17:17.615129 2343139 thread_manager.cpp:28] Actor thread: 524328 finished when process exits.
I20221009 21:17:17.634258 2343139 thread_manager.cpp:28] Actor thread: 524330 finished when process exits.
I20221009 21:17:17.653514 2343139 thread_manager.cpp:28] Actor thread: 524341 finished when process exits.
I20221009 21:17:26.989596 2343139 multi_client_session_context.cpp:172] Finish delete multi client session context.
I20221009 21:17:27.393671 2343139 epoll_comm_network.cpp:89] CommNet Thread 0 finish
I20221009 21:17:27.393947 2343139 epoll_comm_network.cpp:89] CommNet Thread 1 finish
I20221009 21:17:27.394043 2343139 epoll_comm_network.cpp:89] CommNet Thread 2 finish
I20221009 21:17:27.394114 2343139 epoll_comm_network.cpp:89] CommNet Thread 3 finish
看起来大概时间能减到1/3?
看起来大概时间能减到1/3?
好像是的
又优化了一下,更快了,现在是 167.566秒,在16机子上跑的。又减到了 2/5,我这两天再优化一下,这段算法应该能更快。
Graph name: GraphToRun_0 Compile plan time elapsed: 167566 milliseconds
这个记录太长了,最好用文字解读一下
再次优化了一下,现在已经巨快无比了, 在16上跑了2次,总开销将近 42秒,其中优化的内存复用算法部分被压缩到了25秒以内。
I20221011 01:59:19.395260 2967884 time_util.h:82] GenRegstAllocFreeTimeLineAndRegstMutualExclusions time elapsed: 258 milliseconds
I20221011 01:59:42.932510 2967884 time_util.h:82] SelectAlgorithmGenMemBlockOffset4Regsts time elapsed: 23537 milliseconds
I20221011 01:59:42.962549 2967884 time_util.h:82] ChooseBestOneForEachMemChain time elapsed: 30 milliseconds
I20221011 01:50:20.533880 2942685 time_util.h:82] Graph name: GraphToRun_0 InferMemBlockId4MemReusedRegst time elapsed: 24774 milliseconds
I20221011 01:50:20.665287 2942685 time_util.h:82] Graph name: GraphToRun_0 Compile plan time elapsed: 42092 milliseconds
这次的pr主要优化的是 InferMemBlockId4MemReusedRegst,它包含了3个步骤,分别为 GenRegstAllocFreeTimeLineAndRegstMutualExclusions ,SelectAlgorithmGenMemBlockOffset4Regsts ,ChooseBestOneForEachMemChain 。
与优化前啸宇提供的log对比,优化后这部分的运行时间从20分钟降低到了25秒,其中GenRegstAllocFreeTimeLineAndRegstMutualExclusions 的运行时间从7.6秒降低到了0.26秒(小优化),SelectAlgorithmGenMemBlockOffset4Regsts 从20分钟降低到了24秒(大头),ChooseBestOneForEachMemChain 从140毫秒降低到了30毫秒(貌似意义不大)。

然后plan编译的时间(这个时间包含了上面的 InferMemBlockId4MemReusedRegst )从20分钟降低到了 42秒,其中的17秒是来源于其他的算法。(对比log在由晟航在3天前提供)

附图(AI 对 "a photo of an astronaut riding a horse on mars" 的作画结果):

再次修改了一下,编译时间轻微变动,但是不大。 最终的结果是这样,大概20秒这样子。
GenRegstAllocFreeTimeLineAndRegstMutualExclusions took 162 ms
SelectAlgorithmGenMemBlockOffset4Regsts took 19669 ms
choose best one took 154 ms
试了2天,经过了各种尝试,比如说把大部分的HashMap跟HashSet都转换成vector,但是时间差不多,所以放弃了。看起来就是今天啥也没改。
Fixed in:
- https://github.com/Oneflow-Inc/oneflow/pull/9210
- https://github.com/Oneflow-Inc/oneflow/pull/9235
- https://github.com/Oneflow-Inc/oneflow/pull/9281
- https://github.com/Oneflow-Inc/oneflow/pull/9245