Support Deepseek v32
requirements: https://github.com/Dao-AILab/fast-hadamard-transform# latest FlashMLA
Note: My bitonic topk kernel would failed on triton<=3.2.0. I would try to fix it. Upgrading our requirements would be a better option.
Following the fast-hadamard-transform installation guide but failed. May kindly share the installation method
(lmdeploy-py312) [lvhan@pj-h800-013 fast-hadamard-transform]$ pip install -v .
Using pip 25.2 from /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/pip (python 3.12)
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing /nvme1/lvhan/fast-hadamard-transform
Running command python setup.py egg_info
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
!!
********************************************************************************
Please consider removing the following classifiers in favor of a SPDX license expression:
License :: OSI Approved :: BSD License
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
!!
self._finalize_license_expression()
torch.__version__ = 2.8.0+cu128
running egg_info
creating /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info
writing /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/dependency_links.txt
writing requirements to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/requires.txt
writing top-level names to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (2.8.0+cu128)
Requirement already satisfied: packaging in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (25.0)
Requirement already satisfied: ninja in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (1.13.0)
Requirement already satisfied: filelock in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.20.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (4.15.0)
Requirement already satisfied: setuptools in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (80.9.0)
Requirement already satisfied: sympy>=1.13.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.14.0)
Requirement already satisfied: networkx in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.5)
Requirement already satisfied: jinja2 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.1.6)
Requirement already satisfied: fsspec in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.13.1.3)
Requirement already satisfied: triton==3.4.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.4.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from sympy>=1.13.3->torch->fast_hadamard_transform==1.0.4.post1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from jinja2->torch->fast_hadamard_transform==1.0.4.post1) (3.0.3)
Building wheels for collected packages: fast_hadamard_transform
DEPRECATION: Building 'fast_hadamard_transform' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'fast_hadamard_transform'. Discussion can be found at https://github.com/pypa/pip/issues/6334
Running command python setup.py bdist_wheel
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
torch.__version__ = 2.8.0+cu128
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
!!
********************************************************************************
Please consider removing the following classifiers in favor of a SPDX license expression:
License :: OSI Approved :: BSD License
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
!!
self._finalize_license_expression()
running bdist_wheel
Guessing wheel URL: https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
error: <urlopen error [Errno 104] Connection reset by peer>
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/bin/python3.12 -u -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
# import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
# setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
# manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize, traceback
try:
import setuptools
except ImportError:
print(
"ERROR: Can not execute `setup.py` since setuptools failed to import in "
"the build environment with exception:",
file=sys.stderr,
)
traceback.print_exc()
sys.exit(1)
__file__ = %r
sys.argv[0] = __file__
if os.path.exists(__file__):
filename = __file__
with tokenize.open(__file__) as f:
setup_py_code = f.read()
else:
filename = "<auto-generated setuptools caller>"
setup_py_code = "from setuptools import setup; setup()"
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/nvme1/lvhan/fast-hadamard-transform/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-4wzg0rd8
cwd: /nvme1/lvhan/fast-hadamard-transform/
Building wheel for fast_hadamard_transform (setup.py) ... error
ERROR: Failed building wheel for fast_hadamard_transform
Running setup.py clean for fast_hadamard_transform
Running command python setup.py clean
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
torch.__version__ = 2.8.0+cu128
/nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
!!
********************************************************************************
Please consider removing the following classifiers in favor of a SPDX license expression:
License :: OSI Approved :: BSD License
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
!!
self._finalize_license_expression()
running clean
'build/lib.linux-x86_64-cpython-312' does not exist -- can't clean it
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.12' does not exist -- can't clean it
Failed to build fast_hadamard_transform
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> fast_hadamard_transform
error: <urlopen error [Errno 104] Connection reset by peer>
Try build wheel on device with network available.
Guessing wheel URL: https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
error: <urlopen error [Errno 104] Connection reset by peer>
error: subprocess-exited-with-error
Errno 104 happened when it tried the guessing wheel which doesn't exist in fast-hadamard-transform release note
https://github.com/Dao-AILab/fast-hadamard-transform/blob/f134af63deb2df17e1171a9ec1ea4a7d8604d5ca/setup.py#L40
These flags might help.
TP8 with bf16 nccl all_reduce might have low precision.