bitsandbytes Add sm_90a to enable use of accelerated wgmma and setmaxnreg instructions

Add sm_90a to enable use of accelerated wgmma and setmaxnreg instructions

Open ConsceIeratus opened this issue 1 year ago • 5 comments

sm_90a adds support for accelerated wgmma and setmaxnreg instructions.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-module-directives-target

Dec 13 '23 02:12 ConsceIeratus

Thank you, this is good to know that wgmma is now added. I think Hopper supports both sm_90 and sm_90a. Since we do not make use of wgmma or setmaxnreg for now we would not need sm_90a. I would want to not add it at the moment to keep the binary a bit smaller. I am currently having troubles with the binary size since all binaries must be smaller than 100MB for PyPi uploads.

Jan 02 '24 06:01 TimDettmers

Thanks for raising this. We'll keep this in mind and implement it once we have figured out the current cross-platform + build + distribution topics.

Jan 23 '24 21:01 Titus-von-Koeller

For PyPI, you can request a quota increase for the project via https://github.com/pypi/support :)

Jan 30 '24 07:01 akx

Nice - @akx would you mind sharing that in https://github.com/TimDettmers/bitsandbytes/discussions/990 as well 🙏

Jan 30 '24 07:01 younesbelkada

For PyPI, you can request a quota increase for the project via https://github.com/pypi/support :)

Nice, that would unblock us on this before we've figured out the cross-platform compilation and distribution stuff.

I'll look into that today. Valuable input, thanks!

Jan 30 '24 11:01 Titus-von-Koeller

bitsandbytes bitsandbytes copied to clipboard

Add sm_90a to enable use of accelerated wgmma and setmaxnreg instructions

bitsandbytes
bitsandbytes copied to clipboard