Binbin Zhang
Binbin Zhang
please refer https://github.com/wenet-e2e/wenet/issues/188
Just keep on training and then test the final model.
Pytorch use synchronous training for DDP, which means we should have same step number for every GPU.
what is the chunk size in your testing?
where did you change in yaml?
do you try the static batch? does it converge?
@xingchensong @lzhin please follow the issue.
please first try make with docker.
We do not have the arm64 ubuntu, and I have no idea about it. What runtime did you use? ONNX or LibTorch?
https://github.com/wenet-e2e/wenet/blob/main/runtime/core/bin/decoder_main.cc#L33 controls how many threads we use for decoding for `decoder_main`, seems you have used more than 1 threads, so the CPU utilization is over 100%.