wenet
wenet copied to clipboard
training error
I installed the wenet according to the document. When training the Conformer network, the CUDA memory usage will increase with the increase of batch. For example, from 18000MB to 29000MB after training for an hour. An error (out of memory) exit will be reported shortly. Has anyone had this problem?
You could try to reduce batch_size
parameter in configuration file (conf/train_
Solutions u can try:
1. reduce `batch_size` [link](https://github.com/wenet-e2e/wenet/blob/main/examples/aishell/s0/conf/train_conformer.yaml#L65) 2. reduce `max_length` [link](https://github.com/wenet-e2e/wenet/blob/main/examples/aishell/s0/conf/train_conformer.yaml#L39) 3. use `dynamic` batch_type [link](https://github.com/wenet-e2e/wenet/blob/main/examples/aishell/s0/conf/train_conformer.yaml#L64)
Thanks and now my setting is below: batch_size = 16 max_length = 20480 batch_type = 'dynamic'
The problem is still remaining......
try to set batch_size=8
or max_length=10240
@xingchensong hello, if I train with 8k data. the resample_rate should set to 8000 right? and the max_length also should be small to avoid OOM?
yes