DeepSpeed [AMD][ROCm] Improve support of AMD

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors

Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply https://github.com/Tiiiger/QPyTorch/pull/71

Jul 24 '25 11:07 k-artem

@hwchen2017 kindly ask for review after fixed your comments.

Jul 31 '25 15:07 k-artem

@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests?

Sep 02 '25 23:09 loadams

hi @hwchen2017 @loadams Apologies for the delay in this PR. I've updated the code according to the last set of comments. Please review the changes. I've enabled bf16 library-wide; however, I've disabled it for transformer_inference. This is because we need to explicitly declare the bf16 type inside pt_binding.cpp, which is currently not possible due to a limitation I previously mentioned. For some extensions, such as fp_quantizer, I was able to resolve the issue using a forward declaration. Thank you in advance for your review. Also I'd appreciate you share any acceptable ideas about ways to enable bf16 for transformer_inference on the library side.

Oct 13 '25 15:10 k-artem

@hwchen2017 kindly remind about review.

Oct 27 '25 13:10 k-artem

@loadams could you please help with continue review?

Nov 12 '25 11:11 k-artem

@tjruwase @loadams Can you please help move this PR forward? I believe we have addressed all review comments. This PR significantly improves DeepSpeed functionality on AMD hardware.

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply https://github.com/Tiiiger/QPyTorch/pull/71

Related issue: https://github.com/deepspeedai/DeepSpeed/issues/7216

Dec 01 '25 21:12 jithunnair-amd

@tjruwase @loadams Can you please help move this PR forward? I believe we have addressed all review comments. This PR significantly improves DeepSpeed functionality on AMD hardware.

@jithunnair-amd, yes I will focus on this PR.

Dec 09 '25 14:12 sfc-gh-truwase

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Apologies for this question hanging for so long. Since so much has changed over the past months, I think it might be worth having a chat on this.

Dec 09 '25 14:12 sfc-gh-truwase

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Apologies for this question hanging for so long. Since so much has changed over the past months, I think it might be worth having a chat on this.

Sure, would you like to discuss here, or on a different platform eg. email? The gist of it is that we aren't aware of any alternatives for QPyTorch, so creating a deepspeed fork is the next best option to make updates to it. Currently, this lib is only used in unit tests (test_quantized_linear_module.py and test_fp_quant.py).

Dec 09 '25 14:12 jithunnair-amd

alternatives for QPyTorch, so creating a deepspeed fork is the next best option to make updates to it. Currently, this lib is only used in unit tests (test_quantized_linear_module.py and test_fp_quant.py).

Got it. Unfortunately, we lack bandwidth to maintain QPyTorch fork. Moreover, our roadmap is to streamline by deprecating features subject to bandwidth and community interests. Are you interested in maintaining such a fork?

Dec 09 '25 15:12 sfc-gh-truwase

@k-artem can you please address the formatting issues?

Dec 10 '25 13:12 sfc-gh-truwase

@k-artem can you please address the formatting issues?

hi @sfc-gh-truwase , I checked it, actually it looks like a CI issue

yapf.....................................................................Failed
- hook id: yapf
- exit code: 1

Traceback (most recent call last):
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/bin/yapf", line 3, in <module>
    from yapf import run_main
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/__init__.py", line 41, in <module>
    from yapf.yapflib import yapf_api
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/yapflib/yapf_api.py", line 39, in <module>
    from yapf.pyparser import pyparser
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/pyparser/pyparser.py", line 44, in <module>
    from yapf.yapflib import format_token
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/yapflib/format_token.py", line 23, in <module>
    from yapf.pytree import pytree_utils
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/pytree/pytree_utils.py", line 30, in <module>
    from yapf_third_party._ylib2to3 import pygram
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 39, in <module>
    pattern_grammar = driver.load_grammar(_PATTERN_GRAMMAR_FILE)
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 252, in load_grammar
    g.load(gp)
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pgen2/grammar.py", line 95, in load
    d = pickle.load(f)
EOFError: Ran out of input

Dec 10 '25 17:12 k-artem