SAMAudioJudgeModel fails to import - requires perception_models which has numpy conflict
Description
SAMAudioJudgeModel cannot be imported because it depends on core.audio_visual_encoder from facebookresearch/perception_models, which is not declared as a dependency.
When attempting to install both packages together, pip fails due to incompatible numpy requirements.
Error
from sam_audio.model import SAMAudioJudgeModel
# ImportError: No module named 'core.audio_visual_encoder'
Root Cause
In sam_audio/model/judge.py:
from core.audio_visual_encoder.transformer import BaseModelOutputWithPooling
from core.audio_visual_encoder.transformer import Transformer as PEAVTransformer
This imports from perception_models, but:
-
sam-audiorequiresnumpy<2.0 -
perception_modelsrequiresnumpy==2.1.2
ERROR: Cannot install numpy<2.0 and perception-models==1.0.0 because
these package versions have conflicting dependencies.
Environment
- Python: 3.11
- sam-audio: 0.1.0 (from git)
- perception_models: 1.0.0 (from git)
Suggested Fix
Either:
- Update
sam-audioto supportnumpy>=2.0 - Or update
perception_modelsto supportnumpy<2.0 - Or vendor the required
core.audio_visual_encoderclasses directly intosam-audio
Related Issues
- #35 "Cannot find core module"
- #63 "TypeError: SAMAudioJudgeConfig"
This is declared as a dependency here. Running pip install . inside the root directory of sam-audio should install perception_models
Update: Found a working solution thanks to @ZFTurbo's suggestion in #35!
Using the unpin-deps branch of perception_models resolves the numpy version conflict:
perception-models@git+https://github.com/facebookresearch/perception_models@unpin-deps
This allows SAMAudioJudgeModel to load successfully:
{"status":"healthy","model_loaded":true,"fallback_mode":false}
The main branch issue remains (perception_models requires numpy>=2.1 while sam-audio requires numpy<2.0), but the unpin-deps branch provides a working workaround.
1. Create fresh environment
conda create -n sam-audio python=3.11 -y conda activate sam-audio
2. Install PyTorch for CUDA 12.9 (RTX 5090) - Match your GPU REQ
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
3. Force numpy 1.26.4 (compatible with laion-clap)
pip install numpy==1.26.4 --force-reinstall
4. Install core dependencies
conda install -c conda-forge gradio matplotlib ipython ffmpeg -y pip install huggingface-hub soundfile pydub einops torchdiffeq
5. Fix transformer version conflicts
pip install "huggingface-hub==0.35.1" --force-reinstall pip install transformers regex safetensors tokenizers scipy
6. Install laion-clap (works with numpy 1.26.4)
pip install git+https://github.com/lematt1991/CLAP.git
7. Clone and patch perception-models
git clone https://github.com/facebookresearch/perception_models.git cd perception_models
Create patched setup.py with UTF-8 encoding
python -c " with open('setup.py', 'w', encoding='utf-8') as f: f.write('''# -- coding: utf-8 -- from setuptools import setup, find_packages
setup( name='perception_models', version='1.0.0', packages=find_packages(), install_requires=[ 'torch>=1.10.0', 'torchvision>=0.11.0', 'transformers>=4.18.0', 'numpy>=1.26.4', # PATCHED: was ==2.1.2 'einops>=0.4.0', 'timm>=0.6.0', ], )''')"
Install patched version
pip install -e . cd ..
8. Install remaining dependencies
pip install git+https://github.com/facebookresearch/dacvae.git pip install git+https://github.com/facebookresearch/ImageBind.git pip install audiobox_aesthetics torchcodec
9. Clone and install SAM-Audio
git clone https://github.com/facebookresearch/sam-audio.git cd sam-audio pip install -e . cd ..
10. Test installation
python -c " import torch print(f'PyTorch: {torch.version}, CUDA: {torch.cuda.is_available()}') import numpy print(f'numpy: {numpy.version}') import perception_models print('perception_models: OK') import laion_clap print('laion_clap: OK') from sam_audio import SAMAudio, SAMAudioProcessor print('SAM-Audio: SUCCESS!') "