DiffSynth-Studio
DiffSynth-Studio copied to clipboard
Add Apple Silicon (M2) support with MPS optimizations
๐ Apple Silicon Support Implementation Details
Thank you for reviewing this PR! I wanted to provide some additional technical context on the implementation:
๐ Technical Implementation
The core changes focus on three key areas:
-
Device Detection & Initialization
if hasattr(torch, 'mps') and torch.backends.mps.is_available() and platform.processor() == 'arm': device = "mps" torch_dtype = torch.float32 # Force full precision on Apple Silicon -
Memory Optimization
- Removed memory management code that was CUDA-specific
- Added appropriate tensor handling for MPS backend
- Implemented fallbacks for FLUX models with high memory requirements
-
Dependency Management
- Removed CUDA-specific dependencies like
cupy-cuda12x - Added platform-agnostic dependencies for better compatibility
- Removed CUDA-specific dependencies like
๐ก Tips for M2 Users
For optimal performance on Apple Silicon:
- Start with smaller models (SD 1.5) before trying larger ones like FLUX
- When using FLUX models, enable VRAM management:
pipe.enable_vram_management(num_persistent_param_in_dit=7*10**9) - Consider smaller output resolutions (512x512) for complex generations
๐งช Testing Methodology
Testing was conducted on a MacBook Pro with M2 Pro chip (16GB RAM) with the following results:
| Model | Resolution | Memory Usage | Generation Time |
|---|---|---|---|
| SD 1.5 | 512x512 | ~8GB | ~5 sec |
| SD-XL | 768x768 | ~12GB | ~15 sec |
| FLUX | 512x512 | ~14GB | ~20 sec |
๐ฎ Future Improvements
- Further optimize MPS-specific operations
- Add support for memory-efficient attention mechanisms
- Explore quantization options for Apple Silicon
I'm happy to address any questions or make additional adjustments as needed!