Tim Dettmers

Results 106 comments of Tim Dettmers

Thanks for posting the solution to the problem! I am currently not quite understanding what was going on. Is the code that you provided a general improvement to do the...

Currently, Windows is not supported. I do not have access to Windows 10 and I am unable to help with this. It would be great to get help on this...

Thanks @jorahn! What DeepSpeed is doing seems to be the canonical way to handle this. @TitusCornelius is currently working on a conda build and it might be useful to add...

Yes, theoretically you can call the pruning function at each mini-batch iteration. If you look at the code, it is currently only called after the end of each epoch. You...

Thanks for your comment. The method determines the redistribution of weights. There is the problem of what you do if weights are redistributed to layers that are already full (and...

Great catch! Would you mind submitting a pull request for this? I feel like you are able to quickly pinpoint and fix this issue.

You need to train both networks a bit longer (250 epochs) to get better performance, but the gap between dense and sparse performance will remain if you use the default...

Thank you for reporting this! What CUDA version are you using? Triton does improve over time and older versions of CUDA might not have the most optimized matmul kernels for...

This is great, thank you so much for your contribution! Can you run me through how a user would use/compile this for their Intel device? A step-by-step procedure would be...

Thank you so much for creating this, Denis! > 2. With the little I know of BNB, `Adam8bit` usually requires a `StableEmdedding` - which is the same as `nn.Embedding` but...