Chris Taylor
Chris Taylor
Same issue here
I think this is why the README says: `Pin commit to 72b2b641aadc44a7ded6b243915f90df3b3be385 for FSDP compatibility, until to_empty() method is fixed.`
You might have the cables in the wrong places. Use the male/female jumper wires to swap them
Example: https://github.com/catid/dora/blob/9b2055d0b8dd73890e6fbca585a0e52a6a87dde3/dora.py#L66
Seemed to have worked around this by running: ```bash (cleanrl) ➜ cleanrl git:(master) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 zsh: command not found: pip3 (cleanrl) ➜ cleanrl git:(master)...
Seeing the same issue here on RTX 4090: ERROR: CUDA RT call "cudaFuncSetAttribute(&monarch_conv_cuda_32_32_32_kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, 135168)" in line 969 of file /tmp/pip-req-build-j90uf05x/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu failed with invalid argument (1). CUDA Runtime Error at:...
Running the unit test shared above I see the first error here: Conv layer: FlashFFTConv(), seq_len = 8192, dtype = torch.float16, use_32_butterfly = True Input size: torch.Size([4, 128, 4096]) Loss:...