ComfyUI-LTXVideo icon indicating copy to clipboard operation
ComfyUI-LTXVideo copied to clipboard

MAC - MPS Support

Open Valadaress opened this issue 1 week ago • 4 comments

The README specifies CUDA as a prerequisite. Is there any plans to add MPS support for Apple Silicon?

When I download the node in ComfyUI on macOS, it is still incompatible.

Image Image

Device: MAC Studio M3 ULTRA 256GB VRAM

Valadaress avatar Jan 06 '26 18:01 Valadaress

Now some nodes are working but others still with problems

Image

Valadaress avatar Jan 08 '26 15:01 Valadaress

It looks like https://github.com/Lightricks/ComfyUI-LTXVideo/issues/338 is having issue with gemma loader too

zboyles avatar Jan 08 '26 16:01 zboyles

I don't have time to test this but you can try updating from line 433 in the gemma_encoder.py file with:

        # fork_rng requires explicit device list for CUDA; MPS uses CPU RNG state
        device_type = self.model.device.type
        devices = [self.model.device] if device_type == "cuda" else []
        with torch.inference_mode(), torch.random.fork_rng(devices=devices):
            torch.manual_seed(seed)
            if device_type == "mps":
                torch.mps.manual_seed(seed)
            outputs = self.model.generate(
                **model_inputs,
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=0.7,
            )
            generated_ids = outputs[0][len(model_inputs.input_ids[0]) :]
            enhanced_prompt = self.processor.tokenizer.decode(
                generated_ids, skip_special_tokens=True
            )

        return enhanced_prompt

zboyles avatar Jan 08 '26 16:01 zboyles

And if the bfloat16 dtype is giving an error you can search for clip_dtype = torch.bfloat16 around line 598 and change to clip_dtype = torch.float16. Then update the ltxv_gemma_clip function at line 495 with:


def ltxv_gemma_clip(encoder_path, ltxv_path, processor=None, dtype=None):
    class _LTXVGemmaTextEncoderModel(LTXVGemmaTextEncoderModel):
        def __init__(self, device="cpu", dtype=dtype, model_options={}):
            if dtype is None:
                dtype = torch.bfloat16

            gemma_model = Gemma3ForConditionalGeneration.from_pretrained(
                encoder_path,
                local_files_only=True,
                torch_dtype=dtype,
            )

            feature_extractor_linear = load_proj_matrix_from_ltxv(
                ltxv_path,
                "text_embedding_projection.",
            )
            if feature_extractor_linear is None:
                feature_extractor_linear = load_proj_matrix_from_checkpoint(
                    encoder_path / "proj_linear.safetensors"
                )

            embeddings_connector = load_video_embeddings_connector(
                ltxv_path, dtype=dtype
            )
            audio_embeddings_connector = load_audio_embeddings_connector(
                ltxv_path, dtype=dtype
            )
            super().__init__(
                model=gemma_model,
                feature_extractor_linear=feature_extractor_linear,
                embeddings_connector=embeddings_connector,
                audio_embeddings_connector=audio_embeddings_connector,
                processor=processor,
                dtype=dtype,
                device=device,
            )

    return _LTXVGemmaTextEncoderModel

Good luck!

zboyles avatar Jan 08 '26 17:01 zboyles

Looking forward for a fix, thanks!

josephlugo avatar Jan 10 '26 20:01 josephlugo