[Question/Help] Loading LTX-Video GGUF models (Main & T5) on Mac M4 Max with MPS and n_gpu_layers control in ComfyUI

Open cedric310 opened this issue 7 months ago • 1 comments

Hello LTX-Video Team / Community,

I'm trying to run LTX-Video GGUF models on my Mac M4 Max (36GB unified memory) within ComfyUI, specifically:

Main Video Model: ltxv-13b-0.9.7-dev-Q8_0.gguf
Text Encoder: t5-v1_1-xxl-encoder-Q8_0.gguf

I have successfully installed llama-cpp-python with Metal support compiled in my ComfyUI environment (PyTorch 2.4.0, Python 3.11). My goal is to achieve optimal performance by offloading as many layers as possible to the MPS GPU.

However, I'm struggling to find the correct ComfyUI custom nodes (GGUF loaders) that satisfy all these conditions:

Utilize the llama-cpp-python backend.
Provide an explicit n_gpu_layers (or gpu_layers) parameter to control GPU offloading.
Output a MODEL pipe for the LTX-Video GGUF main model.
Output a CLIP pipe for the T5 GGUF text encoder.

The generic GGUF loaders I've tried so far either don't have the n_gpu_layers option, or if they do (like Load LLM Model Advanced from daniel-lewis-ab/ComfyUI-Llama), they output a generic LLM type which isn't directly compatible with the MODEL and CLIP inputs expected by the LTX-Video samplers and ComfyUI's CLIPTextEncode nodes. My current performance is very slow (around 8 minutes per step), indicating CPU-bound GGUF processing.

Could you please recommend the specific ComfyUI custom node pack(s) and GGUF loader nodes (and their settings, especially for n_gpu_layers or equivalent GPU offload control) that are intended or known to work well for loading these LTX-Video GGUF files on a Mac with MPS support?

Are there LTX-Video specific GGUF loaders within the ComfyUI-LTXVideo custom node pack itself that I should be using for this purpose? If so, how can I control the GPU layer offload?

Any example workflows or guidance would be greatly appreciated.

My ComfyUI startup log shows MPS is active, and llama-cpp-python is installed. The main bottleneck seems to be the ComfyUI node interfacing with llama-cpp-python for these specific GGUF model types and exposing the offload controls.

Thank you for your help!

System Details:

OS: macOS (specify your version if you know it, e.g., Sonoma 14.x, or from your log "Darwin 24.4.0")
Chip: Apple M4 Max
Memory: 36GB Unified Memory
ComfyUI: Version 0.3.33 (from your log)
Python: 3.11.11 (Miniconda)
PyTorch: 2.4.0 (with MPS)
llama-cpp-python: 0.3.9 (compiled with Metal support)

May 19 '25 08:05 cedric310

Hi! For non official implementations I would suggest to join the Discord server and ask there: https://discord.gg/Mn8BRgUKKy

May 21 '25 06:05 ybitterman