sam-audio icon indicating copy to clipboard operation
sam-audio copied to clipboard

SAMAudioJudgeModel fails to import - requires perception_models which has numpy conflict

Open bubroz opened this issue 2 months ago • 3 comments

Description

SAMAudioJudgeModel cannot be imported because it depends on core.audio_visual_encoder from facebookresearch/perception_models, which is not declared as a dependency.

When attempting to install both packages together, pip fails due to incompatible numpy requirements.

Error

from sam_audio.model import SAMAudioJudgeModel
# ImportError: No module named 'core.audio_visual_encoder'

Root Cause

In sam_audio/model/judge.py:

from core.audio_visual_encoder.transformer import BaseModelOutputWithPooling
from core.audio_visual_encoder.transformer import Transformer as PEAVTransformer

This imports from perception_models, but:

  • sam-audio requires numpy<2.0
  • perception_models requires numpy==2.1.2
ERROR: Cannot install numpy<2.0 and perception-models==1.0.0 because 
these package versions have conflicting dependencies.

Environment

  • Python: 3.11
  • sam-audio: 0.1.0 (from git)
  • perception_models: 1.0.0 (from git)

Suggested Fix

Either:

  1. Update sam-audio to support numpy>=2.0
  2. Or update perception_models to support numpy<2.0
  3. Or vendor the required core.audio_visual_encoder classes directly into sam-audio

Related Issues

  • #35 "Cannot find core module"
  • #63 "TypeError: SAMAudioJudgeConfig"

bubroz avatar Jan 02 '26 20:01 bubroz

This is declared as a dependency here. Running pip install . inside the root directory of sam-audio should install perception_models

lematt1991 avatar Jan 02 '26 20:01 lematt1991

Update: Found a working solution thanks to @ZFTurbo's suggestion in #35!

Using the unpin-deps branch of perception_models resolves the numpy version conflict:

perception-models@git+https://github.com/facebookresearch/perception_models@unpin-deps

This allows SAMAudioJudgeModel to load successfully:

{"status":"healthy","model_loaded":true,"fallback_mode":false}

The main branch issue remains (perception_models requires numpy>=2.1 while sam-audio requires numpy<2.0), but the unpin-deps branch provides a working workaround.

bubroz avatar Jan 02 '26 21:01 bubroz

1. Create fresh environment

conda create -n sam-audio python=3.11 -y conda activate sam-audio

2. Install PyTorch for CUDA 12.9 (RTX 5090) - Match your GPU REQ

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129

3. Force numpy 1.26.4 (compatible with laion-clap)

pip install numpy==1.26.4 --force-reinstall

4. Install core dependencies

conda install -c conda-forge gradio matplotlib ipython ffmpeg -y pip install huggingface-hub soundfile pydub einops torchdiffeq

5. Fix transformer version conflicts

pip install "huggingface-hub==0.35.1" --force-reinstall pip install transformers regex safetensors tokenizers scipy

6. Install laion-clap (works with numpy 1.26.4)

pip install git+https://github.com/lematt1991/CLAP.git

7. Clone and patch perception-models

git clone https://github.com/facebookresearch/perception_models.git cd perception_models

Create patched setup.py with UTF-8 encoding

python -c " with open('setup.py', 'w', encoding='utf-8') as f: f.write('''# -- coding: utf-8 -- from setuptools import setup, find_packages

setup( name='perception_models', version='1.0.0', packages=find_packages(), install_requires=[ 'torch>=1.10.0', 'torchvision>=0.11.0', 'transformers>=4.18.0', 'numpy>=1.26.4', # PATCHED: was ==2.1.2 'einops>=0.4.0', 'timm>=0.6.0', ], )''')"

Install patched version

pip install -e . cd ..

8. Install remaining dependencies

pip install git+https://github.com/facebookresearch/dacvae.git pip install git+https://github.com/facebookresearch/ImageBind.git pip install audiobox_aesthetics torchcodec

9. Clone and install SAM-Audio

git clone https://github.com/facebookresearch/sam-audio.git cd sam-audio pip install -e . cd ..

10. Test installation

python -c " import torch print(f'PyTorch: {torch.version}, CUDA: {torch.cuda.is_available()}') import numpy print(f'numpy: {numpy.version}') import perception_models print('perception_models: OK') import laion_clap print('laion_clap: OK') from sam_audio import SAMAudio, SAMAudioProcessor print('SAM-Audio: SUCCESS!') "

jjmlovesgit avatar Jan 06 '26 13:01 jjmlovesgit