lmdeploy Support Deepseek v32

requirements: https://github.com/Dao-AILab/fast-hadamard-transform# latest FlashMLA

Note: My bitonic topk kernel would failed on triton<=3.2.0. I would try to fix it. Upgrading our requirements would be a better option.

Oct 08 '25 07:10 grimoire

Following the fast-hadamard-transform installation guide but failed. May kindly share the installation method

(lmdeploy-py312) [lvhan@pj-h800-013 fast-hadamard-transform]$ pip install -v .
Using pip 25.2 from /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/pip (python 3.12)
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing /nvme1/lvhan/fast-hadamard-transform
  Running command python setup.py egg_info
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()


  torch.__version__  = 2.8.0+cu128


  running egg_info
  creating /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info
  writing /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  adding license file 'AUTHORS'
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (2.8.0+cu128)
Requirement already satisfied: packaging in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (25.0)
Requirement already satisfied: ninja in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (1.13.0)
Requirement already satisfied: filelock in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.20.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (4.15.0)
Requirement already satisfied: setuptools in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (80.9.0)
Requirement already satisfied: sympy>=1.13.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.14.0)
Requirement already satisfied: networkx in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.5)
Requirement already satisfied: jinja2 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.1.6)
Requirement already satisfied: fsspec in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.13.1.3)
Requirement already satisfied: triton==3.4.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.4.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from sympy>=1.13.3->torch->fast_hadamard_transform==1.0.4.post1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from jinja2->torch->fast_hadamard_transform==1.0.4.post1) (3.0.3)
Building wheels for collected packages: fast_hadamard_transform
  DEPRECATION: Building 'fast_hadamard_transform' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'fast_hadamard_transform'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Running command python setup.py bdist_wheel
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/bin/python3.12 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize, traceback
  
  try:
      import setuptools
  except ImportError:
      print(
          "ERROR: Can not execute `setup.py` since setuptools failed to import in "
          "the build environment with exception:",
          file=sys.stderr,
      )
      traceback.print_exc()
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/nvme1/lvhan/fast-hadamard-transform/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-4wzg0rd8
  cwd: /nvme1/lvhan/fast-hadamard-transform/
  Building wheel for fast_hadamard_transform (setup.py) ... error
  ERROR: Failed building wheel for fast_hadamard_transform
  Running setup.py clean for fast_hadamard_transform
  Running command python setup.py clean
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running clean
  'build/lib.linux-x86_64-cpython-312' does not exist -- can't clean it
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.12' does not exist -- can't clean it
Failed to build fast_hadamard_transform
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> fast_hadamard_transform

Nov 01 '25 14:11 lvhan028

error: <urlopen error [Errno 104] Connection reset by peer>

Try build wheel on device with network available.

Nov 02 '25 06:11 grimoire

 Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error

Errno 104 happened when it tried the guessing wheel which doesn't exist in fast-hadamard-transform release note

Nov 02 '25 06:11 lvhan028

https://github.com/Dao-AILab/fast-hadamard-transform/blob/f134af63deb2df17e1171a9ec1ea4a7d8604d5ca/setup.py#L40

These flags might help.

Nov 02 '25 07:11 grimoire

TP8 with bf16 nccl all_reduce might have low precision.

Nov 04 '25 11:11 grimoire