Vedant Roy

Results 96 comments of Vedant Roy

> Thank you for the issue. Yes, this is possible, I have it in-progress. Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching...

@casper-hansen Can you share your benchmark code? I'm working on an optimized version of the AWQ GEMM kernel.

Update -- quick mistake, I downloaded ~ 65M clips, but the amount of storage used is actually ~ 175TB. I'm guessing this is because I always downloaded best quality. From...

While we're waiting on this error to get fixed -- is there a version we can downgrade to?

No obligations, but ofc would be appreciated :) For some reason, I assumed the activations were equivalent to what was being stored for the backwards pass. But looks like that's...

Yeah, that's what I was asking. I guess torchviz (https://github.com/szagoruyko/pytorchviz), does this -- but would be nice if the saved tensors from the backward could be mapped to op /...

Is there some chance that I need to use a specific stride? I know my shapes are correct, but it's definitely possible my stride is wrong.

CUDA version: ``` my-compute-node:~/training/replay$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 ``` CuDNN...

Ok, further updates. It looks like it's failing on the backwards pass only. And ... if I use only 2 layers in my model, instead of 4, it doesn't fail....

@cyanguwa -- I'll try to make a minimal reproduction soon. For now, a few more details - Only happens w/ FSDP enabled on multiple ranks - Does not happen if...