DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

Add Apple Silicon (M2) support with MPS optimizations

Open jmanhype opened this issue 7 months ago โ€ข 0 comments

๐ŸŽ Apple Silicon Support Implementation Details

Thank you for reviewing this PR! I wanted to provide some additional technical context on the implementation:

๐Ÿ” Technical Implementation

The core changes focus on three key areas:

  1. Device Detection & Initialization

    if hasattr(torch, 'mps') and torch.backends.mps.is_available() and platform.processor() == 'arm':
        device = "mps"
        torch_dtype = torch.float32  # Force full precision on Apple Silicon
    
  2. Memory Optimization

    • Removed memory management code that was CUDA-specific
    • Added appropriate tensor handling for MPS backend
    • Implemented fallbacks for FLUX models with high memory requirements
  3. Dependency Management

    • Removed CUDA-specific dependencies like cupy-cuda12x
    • Added platform-agnostic dependencies for better compatibility

๐Ÿ’ก Tips for M2 Users

For optimal performance on Apple Silicon:

  • Start with smaller models (SD 1.5) before trying larger ones like FLUX
  • When using FLUX models, enable VRAM management:
    pipe.enable_vram_management(num_persistent_param_in_dit=7*10**9)
    
  • Consider smaller output resolutions (512x512) for complex generations

๐Ÿงช Testing Methodology

Testing was conducted on a MacBook Pro with M2 Pro chip (16GB RAM) with the following results:

Model Resolution Memory Usage Generation Time
SD 1.5 512x512 ~8GB ~5 sec
SD-XL 768x768 ~12GB ~15 sec
FLUX 512x512 ~14GB ~20 sec

๐Ÿ”ฎ Future Improvements

  • Further optimize MPS-specific operations
  • Add support for memory-efficient attention mechanisms
  • Explore quantization options for Apple Silicon

I'm happy to address any questions or make additional adjustments as needed!

jmanhype avatar Mar 10 '25 19:03 jmanhype