fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

llama 训练速度

Open dukGuo opened this issue 6 months ago • 1 comments

训练t2s的速度很慢,大约0.09it/s,我使用的GPU为8卡RTX A6000,batch size 为16,请问这个训练速度正常吗?

我用lightning profiler统计了一下,在backward和step的时候耗时最长 image

这个是用advanced分析的backward和step的结果

Profile stats for: [Strategy]DDPStrategy.backward rank: 0
         190 function calls (185 primitive calls) in 43.795 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.000    0.000   43.795    8.759 strategy.py:191(backward)
        5    0.000    0.000   43.794    8.759 precision.py:45(pre_backward)
        5    0.000    0.000   43.794    8.759 call.py:193(_call_callback_hooks)
        5    0.000    0.000   43.794    8.759 contextlib.py:130(__enter__)
        5    0.000    0.000   43.794    8.759 {built-in method builtins.next}
        5    0.000    0.000   43.794    8.759 profiler.py:55(profile)
        5    0.000    0.000   43.794    8.759 advanced.py:73(start)
        5   43.794    8.759   43.794    8.759 {method 'enable' of '_lsprof.Profiler' objects}
        5    0.000    0.000    0.000    0.000 module.py:1711(__setattr__)
        5    0.000    0.000    0.000    0.000 ddp.py:310(pre_backward)
    20/15    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        5    0.000    0.000    0.000    0.000 parameter.py:8(__instancecheck__)
        5    0.000    0.000    0.000    0.000 __init__.py:1455(debug)
        5    0.000    0.000    0.000    0.000 contextlib.py:279(helper)
        5    0.000    0.000    0.000    0.000 module.py:213(trainer)
        5    0.000    0.000    0.000    0.000 strategy.py:93(precision_plugin)
        5    0.000    0.000    0.000    0.000 __init__.py:1724(isEnabledFor)
        5    0.000    0.000    0.000    0.000 contextlib.py:102(__init__)
       10    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        5    0.000    0.000    0.000    0.000 trainer.py:1173(lightning_module)
        5    0.000    0.000    0.000    0.000 strategy.py:351(model)
       25    0.000    0.000    0.000    0.000 strategy.py:360(lightning_module)
        5    0.000    0.000    0.000    0.000 callback.py:32(state_key)
       15    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        5    0.000    0.000    0.000    0.000 module.py:293(automatic_optimization)
        5    0.000    0.000    0.000    0.000 trainer.py:1120(strategy)
        5    0.000    0.000    0.000    0.000 {function _ParameterMeta.__instancecheck__ at 0x74c4e361f9a0}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.callable}


Profile stats for: [LightningModule]TextToSemantic.optimizer_step rank: 0
         460 function calls (455 primitive calls) in 59.212 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.000    0.000   59.212   11.842 module.py:1275(optimizer_step)
        5    0.000    0.000   59.212   11.842 optimizer.py:84(step)
        5    0.000    0.000   59.212   11.842 ddp.py:253(optimizer_step)
        5    0.000    0.000   59.212   11.842 strategy.py:219(optimizer_step)
        5    0.000    0.000   59.212   11.842 precision.py:112(optimizer_step)
        5    0.000    0.000   59.212   11.842 lr_scheduler.py:70(wrapper)
        5    0.000    0.000   59.212   11.842 optimizer.py:374(wrapper)
        5    0.000    0.000   59.212   11.842 optimizer.py:58(_use_grad)
        5    0.000    0.000   59.212   11.842 adamw.py:152(step)
        5    0.000    0.000   59.211   11.842 precision.py:95(_wrap_closure)
        5    0.000    0.000   59.211   11.842 automatic.py:142(__call__)
        5    0.000    0.000   59.211   11.842 _contextlib.py:112(decorate_context)
        5    0.000    0.000   59.211   11.842 automatic.py:126(closure)
        5    0.000    0.000   59.211   11.842 automatic.py:305(_training_step)
        5    0.000    0.000   59.211   11.842 call.py:302(_call_strategy_hook)
        5    0.000    0.000   59.211   11.842 contextlib.py:130(__enter__)
        5    0.000    0.000   59.211   11.842 {built-in method builtins.next}
        5    0.000    0.000   59.211   11.842 profiler.py:55(profile)
        5    0.000    0.000   59.211   11.842 advanced.py:73(start)
        5   59.211   11.842   59.211   11.842 {method 'enable' of '_lsprof.Profiler' objects}
        5    0.000    0.000    0.000    0.000 optimizer.py:327(_cuda_graph_capture_health_check)
        5    0.000    0.000    0.000    0.000 profiler.py:604(__enter__)
        5    0.000    0.000    0.000    0.000 _ops.py:846(__call__)
        5    0.000    0.000    0.000    0.000 {built-in method torch._ops.profiler._record_function_enter_new}
        5    0.000    0.000    0.000    0.000 __init__.py:105(is_available)
        5    0.000    0.000    0.000    0.000 __init__.py:101(_nvml_based_avail)
        5    0.000    0.000    0.000    0.000 os.py:772(getenv)
        5    0.000    0.000    0.000    0.000 _collections_abc.py:821(get)
        5    0.000    0.000    0.000    0.000 os.py:675(__getitem__)
        5    0.000    0.000    0.000    0.000 graphs.py:23(is_current_stream_capturing)
        5    0.000    0.000    0.000    0.000 module.py:1711(__setattr__)
        5    0.000    0.000    0.000    0.000 _utils.py:851(is_compiling)
        5    0.000    0.000    0.000    0.000 {built-in method torch._C._cuda_isCurrentStreamCapturing}
        5    0.000    0.000    0.000    0.000 profiler.py:593(__init__)
        5    0.000    0.000    0.000    0.000 grad_mode.py:183(__init__)
       10    0.000    0.000    0.000    0.000 grad_mode.py:134(__enter__)
        5    0.000    0.000    0.000    0.000 __init__.py:96(_is_compiled)
    25/20    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       10    0.000    0.000    0.000    0.000 _contextlib.py:149(__new__)
        5    0.000    0.000    0.000    0.000 __init__.py:33(is_built)
       15    0.000    0.000    0.000    0.000 {built-in method torch._C._set_grad_enabled}
        5    0.000    0.000    0.000    0.000 os.py:755(encode)
        5    0.000    0.000    0.000    0.000 __init__.py:157(is_compiling)
        5    0.000    0.000    0.000    0.000 contextlib.py:279(helper)
        5    0.000    0.000    0.000    0.000 os.py:759(decode)
        5    0.000    0.000    0.000    0.000 parameter.py:8(__instancecheck__)
       20    0.000    0.000    0.000    0.000 {built-in method torch.is_grad_enabled}
        5    0.000    0.000    0.000    0.000 _contextlib.py:141(clone)
        5    0.000    0.000    0.000    0.000 contextlib.py:102(__init__)
        5    0.000    0.000    0.000    0.000 typing.py:306(inner)
        5    0.000    0.000    0.000    0.000 trainer.py:1173(lightning_module)
       15    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        5    0.000    0.000    0.000    0.000 __init__.py:1455(debug)
        5    0.000    0.000    0.000    0.000 decorators.py:148(graph_break)
       15    0.000    0.000    0.000    0.000 trainer.py:1120(strategy)
        5    0.000    0.000    0.000    0.000 strategy.py:93(precision_plugin)
       15    0.000    0.000    0.000    0.000 {method 'values' of 'collections.OrderedDict' objects}
        5    0.000    0.000    0.000    0.000 {method 'encode' of 'str' objects}
       10    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x744500}
        5    0.000    0.000    0.000    0.000 {method 'decode' of 'bytes' objects}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        5    0.000    0.000    0.000    0.000 __init__.py:1724(isEnabledFor)
       15    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        5    0.000    0.000    0.000    0.000 typing.py:1737(cast)
        5    0.000    0.000    0.000    0.000 _jit_internal.py:1120(is_scripting)
       10    0.000    0.000    0.000    0.000 strategy.py:360(lightning_module)
        5    0.000    0.000    0.000    0.000 optimizer.py:34(do_nothing_closure)
       10    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        5    0.000    0.000    0.000    0.000 __init__.py:127(annotate)
        5    0.000    0.000    0.000    0.000 {function _ParameterMeta.__instancecheck__ at 0x74c4e361f9a0}


dukGuo avatar Aug 17 '24 14:08 dukGuo