Rajeev
Rajeev
> @rajeevbaalwan I assume that the user of this library is more like an individual who wants to execute the ESPnet model on a low-resource constraint, such as Raspi. If...
> Could you please try setting `drop_last=true` in the dataset config and see if that helps? Still getting same issue. It get stuck after 1st epoch training + validation
> In that case, could you please add the following flags and run the program again? There should be more information on the actual cause of hanging: > > export...
@gau-nernst thanks for your response. Is there any other way for this like modifying the code so it can handle paths from multiple directories? At the end a path is...
> ### Describe the bug > I am trying to train Wav2Vec2 with multi-GPUs (8 A100s). However running the line below leads to a warning and the training freezes after...
> Could you please try the following cmd: > > ``` > torchrun --standalone --nproc_per_node=2 train_sb_wav2vec2.py hparams/wav2vec2_base.yaml --data_folder=data_path_to_ls --output_folder=wav2vec_base_ddp --find_unused_parameters > ``` > > ? If i use above command...
I have single server with 8 GPUs. when i run the training of single GPU it works fine but when executed on multiple GPUs it get stuck. I have installed...
This is a standalone local bare metal server. I can directly ssh into the server.
Nothing is set both echo $LOCAL_RANK and echo $RANK give empty. @Adel-Moumen do i need to set these variables or are they automatically set by torchrun?
@Adel-Moumen You got anything?