bitsandbytes
bitsandbytes copied to clipboard
Add sm_90a to enable use of accelerated wgmma and setmaxnreg instructions
sm_90a adds support for accelerated wgmma and setmaxnreg instructions.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-module-directives-target
Thank you, this is good to know that wgmma is now added. I think Hopper supports both sm_90 and sm_90a. Since we do not make use of wgmma or setmaxnreg for now we would not need sm_90a. I would want to not add it at the moment to keep the binary a bit smaller. I am currently having troubles with the binary size since all binaries must be smaller than 100MB for PyPi uploads.
Thanks for raising this. We'll keep this in mind and implement it once we have figured out the current cross-platform + build + distribution topics.
For PyPI, you can request a quota increase for the project via https://github.com/pypi/support :)
Nice - @akx would you mind sharing that in https://github.com/TimDettmers/bitsandbytes/discussions/990 as well 🙏
For PyPI, you can request a quota increase for the project via https://github.com/pypi/support :)
Nice, that would unblock us on this before we've figured out the cross-platform compilation and distribution stuff.
I'll look into that today. Valuable input, thanks!