Vedant Roy
Vedant Roy
> Thank you for the issue. Yes, this is possible, I have it in-progress. Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching...
@casper-hansen Can you share your benchmark code? I'm working on an optimized version of the AWQ GEMM kernel.
Update -- quick mistake, I downloaded ~ 65M clips, but the amount of storage used is actually ~ 175TB. I'm guessing this is because I always downloaded best quality. From...
While we're waiting on this error to get fixed -- is there a version we can downgrade to?
No obligations, but ofc would be appreciated :) For some reason, I assumed the activations were equivalent to what was being stored for the backwards pass. But looks like that's...
Yeah, that's what I was asking. I guess torchviz (https://github.com/szagoruyko/pytorchviz), does this -- but would be nice if the saved tensors from the backward could be mapped to op /...
Is there some chance that I need to use a specific stride? I know my shapes are correct, but it's definitely possible my stride is wrong.
CUDA version: ``` my-compute-node:~/training/replay$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 ``` CuDNN...
Ok, further updates. It looks like it's failing on the backwards pass only. And ... if I use only 2 layers in my model, instead of 4, it doesn't fail....
@cyanguwa -- I'll try to make a minimal reproduction soon. For now, a few more details - Only happens w/ FSDP enabled on multiple ranks - Does not happen if...