vggt icon indicating copy to clipboard operation
vggt copied to clipboard

Cannot train

Open youwyu opened this issue 3 months ago • 0 comments

I have tried many solutions from online and LLM, but none worked. I have 8xL40S GPU and 800GB RAM. I can run with #node=6 but when I tried to run with #node=7 or #node=8 the program dies showing

[rank0]:     self._check_scale_growth_tracker("unscale_")
[rank0]:   File "/data/user/youwyu/conda-env/vggt-env/lib/python3.12/site-packages/torch/amp/grad_scaler.py", line 162, in _check_scale_growth_tracker
[rank0]:     assert self._scale is not None, (
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: Attempted unscale_ but _scale is None.  This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.

youwyu avatar Sep 09 '25 05:09 youwyu