Vedant Roy comments

Results 96 comments of


                                            Vedant Roy

Cache auto-tuning?

> Thank you for the issue. Yes, this is possible, I have it in-progress. Interesting. Last time I used triton, I wasn't sure if they exposed an API for caching...

GEMV kernel/fused modules are 10x slower at processing context

@casper-hansen Can you share your benchmark code? I'm working on an optimized version of the AWQ GEMM kernel.

Is storage space for clips, or all the videos?

Update -- quick mistake, I downloaded ~ 65M clips, but the amount of storage used is actually ~ 175TB. I'm guessing this is because I always downloaded best quality. From...

expo-image (Android) crashes on 16KB page size devices due to Glide incompatibility

While we're waiting on this error to get fixed -- is there a version we can downgrade to?

What is the best way to use torchlens to figure out what activations are being stored for the backward pass?

No obligations, but ofc would be appreciated :) For some reason, I assumed the activations were equivalent to what was being stored for the backwards pass. But looks like that's...

What is the best way to use torchlens to figure out what activations are being stored for the backward pass?

Yeah, that's what I was asking. I guess torchviz (https://github.com/szagoruyko/pytorchviz), does this -- but would be nice if the saved tensors from the backward could be mapped to op /...

How to debug CUDNN_STATUS_EXECUTION_FAILED?

Is there some chance that I need to use a specific stride? I know my shapes are correct, but it's definitely possible my stride is wrong.

How to debug CUDNN_STATUS_EXECUTION_FAILED?

CUDA version: ``` my-compute-node:~/training/replay$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 ``` CuDNN...

How to debug CUDNN_STATUS_EXECUTION_FAILED?

Ok, further updates. It looks like it's failing on the backwards pass only. And ... if I use only 2 layers in my model, instead of 4, it doesn't fail....

How to debug CUDNN_STATUS_EXECUTION_FAILED?

@cyanguwa -- I'll try to make a minimal reproduction soon. For now, a few more details - Only happens w/ FSDP enabled on multiple ranks - Does not happen if...