Tim Dettmers

Results 106 comments of Tim Dettmers

Thank you for your kind words! I did not read the paper, but I just skimmed it, and the main difference between "To Prune or not to Prune" and our...

This is a very interesting idea. I think from a computational perspective this would be a very promising direction — if successful, it would definitely help us to develop and...

One general problem regarding memory saving is that even though your weights are sparse you still get dense outputs if you have relatively dense inputs. This will probably change in...

If you use the GEMM formulation for convolution you could use a very similar code for convolution. So you could use similar code and just add an img2col and col2img...

cuSPARSE performance can be quite weak for certain matrix sizes and sparsity values and easily be slower than dense. Do you have some numbers for this, that is speed/time taken...

The example that you posted has a couple of problems. For benchmarking you need to use CUDA streams for precise timing since kernel executions are asynchronous. You also need to...

I believe this is an issue with printing the wrong epoch number. From the code, it seems it should work right, but the logged epoch number[ starts again from 1](https://github.com/TimDettmers/sparse_learning/blob/master/mnist_cifar/main.py#L264).

I never tested it for LSTMs. The library itself should work for LSTMs out of the box. However, I am not sure if there might be problems with inducing too...

I actually tried this in experiments and it did not help. We are happy to bring this back, the implementation is not that complicated. Right now we are a bit...

This is an awesome project! Thank you for this. @Ying1123 I am interested in using SGLang for multi-LoRA deployments for a project. The alternative is currently vLLM, but I like...