Tim Dettmers comments

Results 106 comments of


                                            Tim Dettmers

The final sparsity is small than preset.

Thanks for posting the solution to the problem! I am currently not quite understanding what was going on. Is the code that you provided a general improvement to do the...

is it work on Win platform?

Currently, Windows is not supported. I do not have access to Windows 10 and I am unable to help with this. It would be great to get help on this...

is it work on Win platform?

Thanks @jorahn! What DeepSpeed is doing seems to be the canonical way to handle this. @TitusCornelius is currently working on a conda build and it might be useful to add...

Using dynamic growth & pruning?

Yes, theoretically you can call the pruning function at each mini-batch iteration. If you look at the code, it is currently only called after the end of each epoch. You...

Using dynamic growth & pruning?

Thanks for your comment. The method determines the redistribution of weights. There is the problem of what you do if weights are redistributed to layers that are already full (and...

Using dynamic growth & pruning?

Great catch! Would you mind submitting a pull request for this? I feel like you are able to quickly pinpoint and fix this issue.

performance loss a little big

You need to train both networks a bit longer (250 epochs) to get better performance, but the gap between dense and sparse performance will remain if you use the default...

Discrepancies in speed_benchmark.py results on A100

Thank you for reporting this! What CUDA version are you using? Triton does improve over time and older versions of CUDA might not have the most optimized matmul kernels for...

Feature : Add SYCL runtime support

This is great, thank you so much for your contribution! Can you run me through how a user would use/compile this for their Intel device? A step-by-step procedure would be...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Thank you so much for creating this, Denis! > 2. With the little I know of BNB, `Adam8bit` usually requires a `StableEmdedding` - which is the same as `nn.Embedding` but...