Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Warning: Grad strides do not match bucket view strides DDP. (Can impact performance)

Open blaisewf opened this issue 6 months ago • 6 comments

Warning when attempting to start training:

...\lib\site-packages\torch\autograd\__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [64, 1, 4], strides() = [4, 1, 1]
bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:334.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

Maybe this can help https://github.com/pytorch/pytorch/issues/47163 & https://github.com/noahzn/Lite-Mono/issues/43

blaisewf avatar Jan 03 '24 19:01 blaisewf

I encounter the same problem when i used ddp with batch size =2.

UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [180, 6, 1, 1], strides() = [6, 1, 6, 6]
bucket_view.sizes() = [180, 6, 1, 1], strides() = [6, 1, 1, 1] (Triggered internally at `/opt/conda/conda-bld/pytorch_1695392020195/work/torch/csrc/distributed/c10d/reducer.cpp:320.)
Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

BitCalSaul avatar Jan 13 '24 04:01 BitCalSaul

@fumiama sorry but do you know anything about this warnings? seems to be in a lot of VITS implementations

blaisewf avatar Apr 25 '24 18:04 blaisewf

Well, sorry but I also have no idea about it because I have never met this error 😂.

fumiama avatar Apr 28 '24 13:04 fumiama

Well, sorry but I also have no idea about it because I have never met this error 😂.

It happens to everyone at the beginning of training, are you sure it never happened to you?

blaisewf avatar Apr 29 '24 12:04 blaisewf

Well, maybe I have missed some logs😂. I will re-check it later.

fumiama avatar Apr 29 '24 13:04 fumiama

Well, maybe I have missed some logs😂. I will re-check it later.

Maybe this is useful https://chat.openai.com/share/bf7a4bd0-3799-47aa-98bb-666ec4785a4b

blaisewf avatar May 16 '24 22:05 blaisewf