fish-speech
fish-speech copied to clipboard
llama 训练速度
训练t2s的速度很慢,大约0.09it/s,我使用的GPU为8卡RTX A6000,batch size 为16,请问这个训练速度正常吗?
我用lightning profiler统计了一下,在backward和step的时候耗时最长
这个是用advanced分析的backward和step的结果
Profile stats for: [Strategy]DDPStrategy.backward rank: 0
190 function calls (185 primitive calls) in 43.795 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 43.795 8.759 strategy.py:191(backward)
5 0.000 0.000 43.794 8.759 precision.py:45(pre_backward)
5 0.000 0.000 43.794 8.759 call.py:193(_call_callback_hooks)
5 0.000 0.000 43.794 8.759 contextlib.py:130(__enter__)
5 0.000 0.000 43.794 8.759 {built-in method builtins.next}
5 0.000 0.000 43.794 8.759 profiler.py:55(profile)
5 0.000 0.000 43.794 8.759 advanced.py:73(start)
5 43.794 8.759 43.794 8.759 {method 'enable' of '_lsprof.Profiler' objects}
5 0.000 0.000 0.000 0.000 module.py:1711(__setattr__)
5 0.000 0.000 0.000 0.000 ddp.py:310(pre_backward)
20/15 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
5 0.000 0.000 0.000 0.000 parameter.py:8(__instancecheck__)
5 0.000 0.000 0.000 0.000 __init__.py:1455(debug)
5 0.000 0.000 0.000 0.000 contextlib.py:279(helper)
5 0.000 0.000 0.000 0.000 module.py:213(trainer)
5 0.000 0.000 0.000 0.000 strategy.py:93(precision_plugin)
5 0.000 0.000 0.000 0.000 __init__.py:1724(isEnabledFor)
5 0.000 0.000 0.000 0.000 contextlib.py:102(__init__)
10 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
5 0.000 0.000 0.000 0.000 trainer.py:1173(lightning_module)
5 0.000 0.000 0.000 0.000 strategy.py:351(model)
25 0.000 0.000 0.000 0.000 strategy.py:360(lightning_module)
5 0.000 0.000 0.000 0.000 callback.py:32(state_key)
15 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
5 0.000 0.000 0.000 0.000 module.py:293(automatic_optimization)
5 0.000 0.000 0.000 0.000 trainer.py:1120(strategy)
5 0.000 0.000 0.000 0.000 {function _ParameterMeta.__instancecheck__ at 0x74c4e361f9a0}
5 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
Profile stats for: [LightningModule]TextToSemantic.optimizer_step rank: 0
460 function calls (455 primitive calls) in 59.212 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 59.212 11.842 module.py:1275(optimizer_step)
5 0.000 0.000 59.212 11.842 optimizer.py:84(step)
5 0.000 0.000 59.212 11.842 ddp.py:253(optimizer_step)
5 0.000 0.000 59.212 11.842 strategy.py:219(optimizer_step)
5 0.000 0.000 59.212 11.842 precision.py:112(optimizer_step)
5 0.000 0.000 59.212 11.842 lr_scheduler.py:70(wrapper)
5 0.000 0.000 59.212 11.842 optimizer.py:374(wrapper)
5 0.000 0.000 59.212 11.842 optimizer.py:58(_use_grad)
5 0.000 0.000 59.212 11.842 adamw.py:152(step)
5 0.000 0.000 59.211 11.842 precision.py:95(_wrap_closure)
5 0.000 0.000 59.211 11.842 automatic.py:142(__call__)
5 0.000 0.000 59.211 11.842 _contextlib.py:112(decorate_context)
5 0.000 0.000 59.211 11.842 automatic.py:126(closure)
5 0.000 0.000 59.211 11.842 automatic.py:305(_training_step)
5 0.000 0.000 59.211 11.842 call.py:302(_call_strategy_hook)
5 0.000 0.000 59.211 11.842 contextlib.py:130(__enter__)
5 0.000 0.000 59.211 11.842 {built-in method builtins.next}
5 0.000 0.000 59.211 11.842 profiler.py:55(profile)
5 0.000 0.000 59.211 11.842 advanced.py:73(start)
5 59.211 11.842 59.211 11.842 {method 'enable' of '_lsprof.Profiler' objects}
5 0.000 0.000 0.000 0.000 optimizer.py:327(_cuda_graph_capture_health_check)
5 0.000 0.000 0.000 0.000 profiler.py:604(__enter__)
5 0.000 0.000 0.000 0.000 _ops.py:846(__call__)
5 0.000 0.000 0.000 0.000 {built-in method torch._ops.profiler._record_function_enter_new}
5 0.000 0.000 0.000 0.000 __init__.py:105(is_available)
5 0.000 0.000 0.000 0.000 __init__.py:101(_nvml_based_avail)
5 0.000 0.000 0.000 0.000 os.py:772(getenv)
5 0.000 0.000 0.000 0.000 _collections_abc.py:821(get)
5 0.000 0.000 0.000 0.000 os.py:675(__getitem__)
5 0.000 0.000 0.000 0.000 graphs.py:23(is_current_stream_capturing)
5 0.000 0.000 0.000 0.000 module.py:1711(__setattr__)
5 0.000 0.000 0.000 0.000 _utils.py:851(is_compiling)
5 0.000 0.000 0.000 0.000 {built-in method torch._C._cuda_isCurrentStreamCapturing}
5 0.000 0.000 0.000 0.000 profiler.py:593(__init__)
5 0.000 0.000 0.000 0.000 grad_mode.py:183(__init__)
10 0.000 0.000 0.000 0.000 grad_mode.py:134(__enter__)
5 0.000 0.000 0.000 0.000 __init__.py:96(_is_compiled)
25/20 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
10 0.000 0.000 0.000 0.000 _contextlib.py:149(__new__)
5 0.000 0.000 0.000 0.000 __init__.py:33(is_built)
15 0.000 0.000 0.000 0.000 {built-in method torch._C._set_grad_enabled}
5 0.000 0.000 0.000 0.000 os.py:755(encode)
5 0.000 0.000 0.000 0.000 __init__.py:157(is_compiling)
5 0.000 0.000 0.000 0.000 contextlib.py:279(helper)
5 0.000 0.000 0.000 0.000 os.py:759(decode)
5 0.000 0.000 0.000 0.000 parameter.py:8(__instancecheck__)
20 0.000 0.000 0.000 0.000 {built-in method torch.is_grad_enabled}
5 0.000 0.000 0.000 0.000 _contextlib.py:141(clone)
5 0.000 0.000 0.000 0.000 contextlib.py:102(__init__)
5 0.000 0.000 0.000 0.000 typing.py:306(inner)
5 0.000 0.000 0.000 0.000 trainer.py:1173(lightning_module)
15 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
5 0.000 0.000 0.000 0.000 __init__.py:1455(debug)
5 0.000 0.000 0.000 0.000 decorators.py:148(graph_break)
15 0.000 0.000 0.000 0.000 trainer.py:1120(strategy)
5 0.000 0.000 0.000 0.000 strategy.py:93(precision_plugin)
15 0.000 0.000 0.000 0.000 {method 'values' of 'collections.OrderedDict' objects}
5 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects}
10 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x744500}
5 0.000 0.000 0.000 0.000 {method 'decode' of 'bytes' objects}
5 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
5 0.000 0.000 0.000 0.000 __init__.py:1724(isEnabledFor)
15 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
5 0.000 0.000 0.000 0.000 typing.py:1737(cast)
5 0.000 0.000 0.000 0.000 _jit_internal.py:1120(is_scripting)
10 0.000 0.000 0.000 0.000 strategy.py:360(lightning_module)
5 0.000 0.000 0.000 0.000 optimizer.py:34(do_nothing_closure)
10 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
5 0.000 0.000 0.000 0.000 __init__.py:127(annotate)
5 0.000 0.000 0.000 0.000 {function _ParameterMeta.__instancecheck__ at 0x74c4e361f9a0}