Jiatong (Julius) Han

Results 216 comments of Jiatong (Julius) Han
trafficstars

Try python3 main.py --logdir /tmp -t --postfix test -b configs/train_colossalai_cifar10.yaml --placement_policy cuda It was likely due to a previous version of defaulting to auto placement, which often introduced tensor device...

How did you set `--nproc_per_node=gpu`? I cannot see where is the `gpu` defined and it is supposed to be a number that does not exceed 2. Other than that, I...

Please stick to even-sized `nproc_per_node` for now (or setting it to `1`). The reason was the temporal dimension of the DiT attention block is of `16` which is not divisible...

It should only be `None` after `optimizer.zero_grad()`; `booster.backward` was doing `torch.optim.Optimizer.backward(loss)`. Would you mind printing the contents of `loss` to see if it is `NaN`?

Thanks for sharing your solution. And for cross-referencing, this issue was similar to issue #258.

Can you `pip install --upgrade flash-attn --no-build-isolation`?

I am gonna close this issue since it appears to have been resolved by the question owner.

Yes, you may use docker to build training or inference environment. For windows, you might want to use WSL to maybe get around with Docker.