LTX-Video Out of Memory using 32G of RAM on a 5090

Good day,

I have been attempting to generate video on my RTX 5090 with 32G of VRAM, directly using python. In every instance, I get the same result, which is out of memory.

I have 0.923Gi in use our of 31.843Gi, so lets round this and say that 1G is in use out of 32G of RAM.

Environment Linux Ubuntu 24.04 RTX 5090 with 32G of VRAM Python 3.10.16

I borrowed the prompts from another poster below just to test this out

python inference.py --prompt "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene." --height 480 --width 704 --num_frames 161 --seed 30 --output_path="lala.mp4" --pipeline_config ltxv-13b-0.9.7-dev.yaml

Running generation with arguments: Namespace(output_path='lala.mp4', seed=30, num_images_per_prompt=1, image_cond_noise_scale=0.15, height=480, width=704, num_frames=161, frame_rate=30, device=None, pipeline_config='configs/ltxv-13b-0.9.7-dev.yaml', prompt="A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene.", negative_prompt='worst quality, inconsistent motion, blurry, jittery, distorted', offload_to_cpu=False, input_media_path=None, strength=1.0, conditioning_media_paths=None, conditioning_strengths=None, conditioning_start_frames=None) Padded dimensions: 480x704x161 Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 155.50it/s] Traceback (most recent call last): File "/home/wedwards/Documents/Programs/LTX-Video/inference.py", line 776, in main() File "/home/wedwards/Documents/Programs/LTX-Video/inference.py", line 298, in main infer(**vars(args)) File "/home/wedwards/Documents/Programs/LTX-Video/inference.py", line 535, in infer pipeline = create_ltx_video_pipeline( File "/home/wedwards/Documents/Programs/LTX-Video/inference.py", line 343, in create_ltx_video_pipeline text_encoder = text_encoder.to(device) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in to return super().to(*args, **kwargs) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1355, in to return self._apply(convert) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply module._apply(fn) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply module._apply(fn) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 942, in _apply param_applied = fn(param) File "/home/wedwards/anaconda3/envs/ltxstudio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1341, in convert return t.to( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 31.36 GiB of which 65.25 MiB is free. Including non-PyTorch memory, this process has 30.78 GiB memory in use. Of the allocated memory 30.24 GiB is allocated by PyTorch, and 56.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Below was the other attempt with the same results, being out of memory. I kept lowering my requirements in terms of size, number of seconds, frames per second until finally I concluded this just wasnt working.

import subprocess
import shlex
from datetime import datetime
import os

import torch
torch.cuda.empty_cache()
torch.cuda.ipc_collect()  # optional, frees unused shared memory

if torch.cuda.is_available():
    gpu_idx = 0  # or whichever GPU index you want
    stats = torch.cuda.memory_stats(gpu_idx)
    reserved = torch.cuda.memory_reserved(gpu_idx)
    allocated = torch.cuda.memory_allocated(gpu_idx)
    free = reserved - allocated

    print(f"Total Reserved Memory: {reserved / 1e9:.2f} GB")
    print(f"Currently Allocated:   {allocated / 1e9:.2f} GB")
    print(f"Available in Cache:    {free / 1e9:.2f} GB")
else:
    print("CUDA not available.")

# import pynvml

# pynvml.nvmlInit()
# handle = pynvml.nvmlDeviceGetHandleByIndex(0)  # GPU 0

# info = pynvml.nvmlDeviceGetMemoryInfo(handle)
# print(f"Total GPU memory:     {info.total / 1e9:.2f} GB")
# print(f"Used GPU memory:      {info.used / 1e9:.2f} GB")
# print(f"Free GPU memory:      {info.free / 1e9:.2f} GB")


def run_inference_from_file(
    prompt_file: str,
    image_path: str,
    height: int,
    width: int,
    seconds: int,
    fps: int,
    seed: int,
    pipeline_config: str = "configs/ltxv-13b-0.9.7-dev.yaml",
    start_frame: int = 0,
    output_dir: str = "/OutputDirectory/nameOfFile.mp4"
):
    """
    Runs inference from prompt file and image, saving video to a timestamped, seed-labeled output file.

    Parameters:
    - prompt_file (str): Path to the text file containing the prompt.
    - image_path (str): Path to the image file used for conditioning.
    - height (int): Height of output video frames.
    - width (int): Width of output video frames.
    - seconds (int): Duration in seconds of the video to be generated.
    - fps (int): Frames per second.
    - seed (int): Random seed.
    - pipeline_config (str): Path to the pipeline configuration file.
    - start_frame (int): Frame index to start conditioning from.
    - output_dir (str): Directory to save the output video file.
    """

    # Read prompt
    try:
        with open(prompt_file, 'r') as f:
            prompt = f.read().strip()
    except FileNotFoundError:
        print(f"Prompt file not found: {prompt_file}")
        return
    except Exception as e:
        print(f"Error reading prompt file: {e}")
        return

    # Calculate number of frames
    num_frames = seconds * fps

    # Create timestamped and seed-specific output filename with seconds
    timestamp = datetime.now().strftime("%d%b%Y_%H%M%S")
    output_filename = f"video_{timestamp}_seed_{seed}.mp4"
    output_path = os.path.join(output_dir, output_filename)

    # Construct command
    command = (
        f"python inference.py "
        f"--prompt {shlex.quote(prompt)} "
        f"--conditioning_media_paths {shlex.quote(image_path)} "
        f"--conditioning_start_frames {start_frame} "
        f"--height {height} "
        f"--width {width} "
        f"--num_frames {num_frames} "
        f"--seed {seed} "
        f"--pipeline_config {shlex.quote(pipeline_config)} "
        f"--output_path {shlex.quote(output_path)}"
    )

    print(f"Running command:\n{command}\n")
    result = subprocess.run(shlex.split(command), capture_output=True, text=True)

    if result.returncode == 0:
        print(f"Inference completed successfully.\nOutput saved to: {output_path}")
        print(result.stdout)
    else:
        print("Error occurred during inference:")
        print(result.stderr)


# Example usage
if __name__ == "__main__":
    run_inference_from_file(
        prompt_file="/pathToPromptFile",
        image_path="/pathToImage.jpeg",
        height=512,
        width=512,
        seconds=3,
        fps=16,
        seed=42
    )

May 11 '25 06:05 MrEdwards007

I'm experiencing similar issues in a Mac Mini M4 Pro with just 24 GB of RAM.

RuntimeError: MPS backend out of memory (MPS allocated: 27.10 GB, other allocations: 464.00 KB, max allowed: 27.20 GB). Tried to allocate 108.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Here's the whole command and output. I'm using python 3.10.17 in this venv:

(.venv) fran@M4Pro LTX-Video % python inference.py --prompt "Don Quixote with windmills" --height 480  --width 720 --num_frames 25 --seed 666 --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
Running generation with arguments: Namespace(output_path=None, seed=666, num_images_per_prompt=1, image_cond_noise_scale=0.15, height=480, width=720, num_frames=25, frame_rate=30, device=None, pipeline_config='configs/ltxv-13b-0.9.7-distilled.yaml', prompt='Don Quixote with windmills', negative_prompt='worst quality, inconsistent motion, blurry, jittery, distorted', offload_to_cpu=False, input_media_path=None, conditioning_media_paths=None, conditioning_strengths=None, conditioning_start_frames=None)
ltxv-13b-0.9.7-distilled.safetensors: 100%|████████████████████████████████████████████████| 28.6G/28.6G [09:01<00:00, 52.8MB/s]
ltxv-spatial-upscaler-0.9.7.safetensors: 100%|███████████████████████████████████████████████| 505M/505M [00:08<00:00, 59.4MB/s]
Padded dimensions: 480x736x25
config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 788/788 [00:00<00:00, 2.21MB/s]
model.safetensors.index.json: 100%|████████████████████████████████████████████████████████| 19.9k/19.9k [00:00<00:00, 9.31MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████| 9.06G/9.06G [05:15<00:00, 28.7MB/s]
model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████| 9.99G/9.99G [05:39<00:00, 29.5MB/s]
Fetching 2 files: 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [05:39<00:00, 169.80s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 42.58it/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████| 20.5k/20.5k [00:00<00:00, 10.2MB/s]
spiece.model: 100%|██████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 8.78MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████████| 2.63k/2.63k [00:00<00:00, 10.2MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 10.8MB/s]
Traceback (most recent call last):
  File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 774, in <module>
    main()
  File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 298, in main
    infer(**vars(args))
  File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 534, in infer
    pipeline = create_ltx_video_pipeline(
  File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 342, in create_ltx_video_pipeline
    vae = vae.to(device)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 1353, in to
    return super().to(*args, **kwargs)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
  File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1341, in convert
    return t.to(
RuntimeError: MPS backend out of memory (MPS allocated: 27.10 GB, other allocations: 464.00 KB, max allowed: 27.20 GB). Tried to allocate 108.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

I eventually managed to bypass that by following that suggestion and running this command:

(.venv) fran@M4Pro LTX-Video % export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 && python inference.py --prompt "Don Quixote with windmills" --height 480 --width 720 --num_frames 25 --seed 666 --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml

But then I got gibberish in the output instead of a video.

May 15 '25 18:05 fmmarzoa

Try the WanGP implementation of LTX Video 13B, it has been highly optimized and VRAM requirements have been reduced by 4. It runs with as little a 8 GB of VRAM. https://github.com/deepbeepmeep/Wan2GP

May 17 '25 09:05 deepbeepmeep

I tried the WanGP implementation and was successful. It has a good interface, making it easy to use and the output was very nice. One of the things I understood from watching several commentaries is that the development of videos was supposed to be fast. That may be relative but on my 5090 I found waiting once for 42 minutes for a 720p video 5 seconds in length was pretty long. I dropped the requirements down to 480p and the wait was 19 minutes. The quality was good but I had higher expectations for the generation time.

I'll install ComfyUI to see how the main branch of LTX Studio produces with that workflow. If there are settings that I can use to only use Python, I would love to know them as I thought 30-31G of available VRAM would be sufficient.

May 19 '25 13:05 MrEdwards007

There is certainly something wrong with your setup : with my RTX 5090 it takes less than 5 minutes to generate a 720p 5s (153 frames) video using memory profile 4 (which is not the fastest).

If everything is working properly you should see for a RTX 5090 a consumption of at least 500W during the whole generation. You can use a tool like GPU-Z to verify that.

Please check the following :

Sage 2 attention is installed and selected (instructions on how to install sage are on the homepage).
The window sliding size (in the advanced features) is greater than 153 frames otherwise multiple generations will be needed
Boost mode is turned on in the Configuration tab.
use the default int8 quantization.

You can get even some extract speed by switching in the configuration tab from default memory profile 4 to profile 3 : the whole model will be preloaded in VRAM.

May 19 '25 15:05 deepbeepmeep

You were right. I went back and reinstalled everything, just to make sure. A 480p video of 5 seconds now takes 7.5-8 minutes, which is a vast improvement. I havent figure out the memory profile yet, so I'm using the defaults but again, it is SO much faster.

May 24 '25 14:05 MrEdwards007

how can they achieve realtime? is it really just more vram? https://arxiv.org/pdf/2508.05115

Oct 27 '25 02:10 johndpope