wenet
wenet copied to clipboard
Train with shard mode break after every epoch
Describe the bug
While I'm trying to train my model with shard data mode using gloo, I got an error after every epoch and my task stoped .like this :
/opt/conda/envs/wenet/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/data/wenet_aishell2_shards/train/shards_000005738.tar'>
self._target(*self._args, **self._kwargs)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
File "wenet/bin/train.py", line 262, in
To Reproduce Steps to reproduce the behavior:
- train model with shard mode by set --data_type "shard"
- wait for at least one epoch complete,you my get it.
Expected behavior train with shard mode successfully all the time
Screenshots
Desktop (please complete the following information):
- OS: [LINUX x86_64]
- Version [commit f972951275261ed14f3ba10f1b70716970f758ec (HEAD -> main)]
Is it the latest code? It should be close by https://github.com/wenet-e2e/wenet/blob/main/wenet/dataset/processor.py#L109.
fiexd, close this issue