Out of Memory using 32G of RAM on a 5090
Good day,
I have been attempting to generate video on my RTX 5090 with 32G of VRAM, directly using python. In every instance, I get the same result, which is out of memory.
I have 0.923Gi in use our of 31.843Gi, so lets round this and say that 1G is in use out of 32G of RAM.
Environment Linux Ubuntu 24.04 RTX 5090 with 32G of VRAM Python 3.10.16
I borrowed the prompts from another poster below just to test this out
python inference.py --prompt "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene." --height 480 --width 704 --num_frames 161 --seed 30 --output_path="lala.mp4" --pipeline_config ltxv-13b-0.9.7-dev.yaml
Running generation with arguments: Namespace(output_path='lala.mp4', seed=30, num_images_per_prompt=1, image_cond_noise_scale=0.15, height=480, width=704, num_frames=161, frame_rate=30, device=None, pipeline_config='configs/ltxv-13b-0.9.7-dev.yaml', prompt="A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene.", negative_prompt='worst quality, inconsistent motion, blurry, jittery, distorted', offload_to_cpu=False, input_media_path=None, strength=1.0, conditioning_media_paths=None, conditioning_strengths=None, conditioning_start_frames=None)
Padded dimensions: 480x704x161
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 155.50it/s]
Traceback (most recent call last):
File "/home/wedwards/Documents/Programs/LTX-Video/inference.py", line 776, in
Below was the other attempt with the same results, being out of memory. I kept lowering my requirements in terms of size, number of seconds, frames per second until finally I concluded this just wasnt working.
import subprocess
import shlex
from datetime import datetime
import os
import torch
torch.cuda.empty_cache()
torch.cuda.ipc_collect() # optional, frees unused shared memory
if torch.cuda.is_available():
gpu_idx = 0 # or whichever GPU index you want
stats = torch.cuda.memory_stats(gpu_idx)
reserved = torch.cuda.memory_reserved(gpu_idx)
allocated = torch.cuda.memory_allocated(gpu_idx)
free = reserved - allocated
print(f"Total Reserved Memory: {reserved / 1e9:.2f} GB")
print(f"Currently Allocated: {allocated / 1e9:.2f} GB")
print(f"Available in Cache: {free / 1e9:.2f} GB")
else:
print("CUDA not available.")
# import pynvml
# pynvml.nvmlInit()
# handle = pynvml.nvmlDeviceGetHandleByIndex(0) # GPU 0
# info = pynvml.nvmlDeviceGetMemoryInfo(handle)
# print(f"Total GPU memory: {info.total / 1e9:.2f} GB")
# print(f"Used GPU memory: {info.used / 1e9:.2f} GB")
# print(f"Free GPU memory: {info.free / 1e9:.2f} GB")
def run_inference_from_file(
prompt_file: str,
image_path: str,
height: int,
width: int,
seconds: int,
fps: int,
seed: int,
pipeline_config: str = "configs/ltxv-13b-0.9.7-dev.yaml",
start_frame: int = 0,
output_dir: str = "/OutputDirectory/nameOfFile.mp4"
):
"""
Runs inference from prompt file and image, saving video to a timestamped, seed-labeled output file.
Parameters:
- prompt_file (str): Path to the text file containing the prompt.
- image_path (str): Path to the image file used for conditioning.
- height (int): Height of output video frames.
- width (int): Width of output video frames.
- seconds (int): Duration in seconds of the video to be generated.
- fps (int): Frames per second.
- seed (int): Random seed.
- pipeline_config (str): Path to the pipeline configuration file.
- start_frame (int): Frame index to start conditioning from.
- output_dir (str): Directory to save the output video file.
"""
# Read prompt
try:
with open(prompt_file, 'r') as f:
prompt = f.read().strip()
except FileNotFoundError:
print(f"Prompt file not found: {prompt_file}")
return
except Exception as e:
print(f"Error reading prompt file: {e}")
return
# Calculate number of frames
num_frames = seconds * fps
# Create timestamped and seed-specific output filename with seconds
timestamp = datetime.now().strftime("%d%b%Y_%H%M%S")
output_filename = f"video_{timestamp}_seed_{seed}.mp4"
output_path = os.path.join(output_dir, output_filename)
# Construct command
command = (
f"python inference.py "
f"--prompt {shlex.quote(prompt)} "
f"--conditioning_media_paths {shlex.quote(image_path)} "
f"--conditioning_start_frames {start_frame} "
f"--height {height} "
f"--width {width} "
f"--num_frames {num_frames} "
f"--seed {seed} "
f"--pipeline_config {shlex.quote(pipeline_config)} "
f"--output_path {shlex.quote(output_path)}"
)
print(f"Running command:\n{command}\n")
result = subprocess.run(shlex.split(command), capture_output=True, text=True)
if result.returncode == 0:
print(f"Inference completed successfully.\nOutput saved to: {output_path}")
print(result.stdout)
else:
print("Error occurred during inference:")
print(result.stderr)
# Example usage
if __name__ == "__main__":
run_inference_from_file(
prompt_file="/pathToPromptFile",
image_path="/pathToImage.jpeg",
height=512,
width=512,
seconds=3,
fps=16,
seed=42
)
I'm experiencing similar issues in a Mac Mini M4 Pro with just 24 GB of RAM.
RuntimeError: MPS backend out of memory (MPS allocated: 27.10 GB, other allocations: 464.00 KB, max allowed: 27.20 GB). Tried to allocate 108.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
Here's the whole command and output. I'm using python 3.10.17 in this venv:
(.venv) fran@M4Pro LTX-Video % python inference.py --prompt "Don Quixote with windmills" --height 480 --width 720 --num_frames 25 --seed 666 --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
Running generation with arguments: Namespace(output_path=None, seed=666, num_images_per_prompt=1, image_cond_noise_scale=0.15, height=480, width=720, num_frames=25, frame_rate=30, device=None, pipeline_config='configs/ltxv-13b-0.9.7-distilled.yaml', prompt='Don Quixote with windmills', negative_prompt='worst quality, inconsistent motion, blurry, jittery, distorted', offload_to_cpu=False, input_media_path=None, conditioning_media_paths=None, conditioning_strengths=None, conditioning_start_frames=None)
ltxv-13b-0.9.7-distilled.safetensors: 100%|████████████████████████████████████████████████| 28.6G/28.6G [09:01<00:00, 52.8MB/s]
ltxv-spatial-upscaler-0.9.7.safetensors: 100%|███████████████████████████████████████████████| 505M/505M [00:08<00:00, 59.4MB/s]
Padded dimensions: 480x736x25
config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 788/788 [00:00<00:00, 2.21MB/s]
model.safetensors.index.json: 100%|████████████████████████████████████████████████████████| 19.9k/19.9k [00:00<00:00, 9.31MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████| 9.06G/9.06G [05:15<00:00, 28.7MB/s]
model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████| 9.99G/9.99G [05:39<00:00, 29.5MB/s]
Fetching 2 files: 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [05:39<00:00, 169.80s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 42.58it/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████| 20.5k/20.5k [00:00<00:00, 10.2MB/s]
spiece.model: 100%|██████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 8.78MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████████| 2.63k/2.63k [00:00<00:00, 10.2MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 10.8MB/s]
Traceback (most recent call last):
File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 774, in <module>
main()
File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 298, in main
infer(**vars(args))
File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 534, in infer
pipeline = create_ltx_video_pipeline(
File "/Volumes/SSD2TB/tmp/LTX-Video/inference.py", line 342, in create_ltx_video_pipeline
vae = vae.to(device)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 1353, in to
return super().to(*args, **kwargs)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
File "/Volumes/SSD2TB/tmp/LTX-Video/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
RuntimeError: MPS backend out of memory (MPS allocated: 27.10 GB, other allocations: 464.00 KB, max allowed: 27.20 GB). Tried to allocate 108.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
I eventually managed to bypass that by following that suggestion and running this command:
(.venv) fran@M4Pro LTX-Video % export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 && python inference.py --prompt "Don Quixote with windmills" --height 480 --width 720 --num_frames 25 --seed 666 --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
But then I got gibberish in the output instead of a video.
Try the WanGP implementation of LTX Video 13B, it has been highly optimized and VRAM requirements have been reduced by 4. It runs with as little a 8 GB of VRAM. https://github.com/deepbeepmeep/Wan2GP
I tried the WanGP implementation and was successful. It has a good interface, making it easy to use and the output was very nice. One of the things I understood from watching several commentaries is that the development of videos was supposed to be fast. That may be relative but on my 5090 I found waiting once for 42 minutes for a 720p video 5 seconds in length was pretty long. I dropped the requirements down to 480p and the wait was 19 minutes. The quality was good but I had higher expectations for the generation time.
I'll install ComfyUI to see how the main branch of LTX Studio produces with that workflow. If there are settings that I can use to only use Python, I would love to know them as I thought 30-31G of available VRAM would be sufficient.
There is certainly something wrong with your setup : with my RTX 5090 it takes less than 5 minutes to generate a 720p 5s (153 frames) video using memory profile 4 (which is not the fastest).
If everything is working properly you should see for a RTX 5090 a consumption of at least 500W during the whole generation. You can use a tool like GPU-Z to verify that.
Please check the following :
- Sage 2 attention is installed and selected (instructions on how to install sage are on the homepage).
- The window sliding size (in the advanced features) is greater than 153 frames otherwise multiple generations will be needed
- Boost mode is turned on in the Configuration tab.
- use the default int8 quantization.
You can get even some extract speed by switching in the configuration tab from default memory profile 4 to profile 3 : the whole model will be preloaded in VRAM.
You were right. I went back and reinstalled everything, just to make sure. A 480p video of 5 seconds now takes 7.5-8 minutes, which is a vast improvement. I havent figure out the memory profile yet, so I'm using the defaults but again, it is SO much faster.
how can they achieve realtime? is it really just more vram? https://arxiv.org/pdf/2508.05115