Improve single-instance rendering performance
We’ve discovered that running Remotion renderer on powerful VPSs (e.g., 48 cores, 350GB RAM) with increased concurrency does not consistently improve performance and can sometimes make rendering slower. Investigations point to two main bottlenecks:
Chrome Resource Allocation
Simply opening more Chrome tabs doesn’t linearly increase resource usage. Running multiple browser instances (rather than multiple tabs in a single instance) can yield better (but still limited) performance gains.
OffthreadVideo Frame Extraction
Videos are extracted frame-by-frame, making it difficult to parallelize. Composition with heavy OffthreadVideos can see significant slowdowns at high concurrency.
A thorough discussion of these findings, along with logs and performance charts, is available in the existing issue.
Goal The goal of this issue is spark a conversation on solutions to improve single instance rendering performance, with for example:
New concurrency strategies (e.g., multi-browser instances) Possible refactors for more efficient frame extraction (e.g., multi-threading/process compositor, preprocessing, optimistic caching, open the same video streams concurrently).
We use Remotion professionally and this would have a great impact for us, I'll open a $500 bounty for a working solution.
~~💎 $500 bounty • zigg.team~~
⏸️ Bounty Status: ON HOLD
This bounty has been placed on hold to ensure any solutions align with the project's long-term direction. It might be reactivated in the future if the issue is ready to accept external contributions.
/attempt #4664
| Algora profile | Completed bounties | Tech | Active attempts | Options |
|---|---|---|---|---|
| @abhishek818 | Â Â Â 3 remotion bounties + 20 bounties from 5 projects |
Go, TypeScript, Rust & more |
Cancel attempt |
Thanks for filing and posting a bounty!
Although appreciated, I cancelled the bounty because there are tons of contributors who would attempt this without any context.
The hard issues we don't wanna outsource without also specifying how we expect the issue to be solved, otherwise random things happen which leads to overhead.
The feature request is valid but it concerns two completely separate things.
Frame extraction:
The path forward for improving frame extraction performance I think is to further progress @remotion/media-parser + the WebCodecs interface so that this can happen in-browser, without the overhead of Rust+serializing to images+Node at play. Plus it would make it possible to stream the video rather than downloading it entirely and if available use the hardware acceleration.
I'm working myself towards this goal, which would be a new video tag that would replace Video and OffthreadVideo in the long-term. Please understand though that this will still take some time.
Chrome resources:
It would be nice to support having multiple Chrome instances and be able to specify the concurrency like 2x2 (2 browser instances with 2 tabs each).
@JonnyBurger understandable. Do let me know if there is anything we can do to offer support for this work
Hello there – just wanted to share my experience in case it helps others trying to optimize Remotion performance on a single instance.
I was evaluating fairly heavy compositions: multiple video inputs (up to 100), animations, opacity, and blur effects. Out of the box, the performance wasn’t great, but after some tuning, I’m now getting solid results. Here's what made the biggest difference:
-
OffthreadVideo wasn’t fast enough in my case – rendering with ffmpeg to an image sequence and using that in the composition was more performant. (Not sure how this compares with the latest Remotion version.)
-
Since you're dealing with a lot of image files, disk I/O is important. Mounting and running from RAM disk gave a significant speedup.
-
Heavy effects like blur were faster when rendered to a canvas instead of using the Img component.
-
Calling renderMedia in chunks and stitching results helped fully utilize the CPU cores.
Example: rendering a 1080p composition with 10 video inputs takes about 1:1 real-time (video length = render time), including EC2 c6i.8xlarge spin-up time from a custom AMI. That’s roughly on par with After Effects or Blender for me. Sure, horizontal scaling would be faster though...
@algoholger thanks for the insights!
Out of all of these, which one had the most drastic impact on performance you'd say?
@tzvc It really depends on your rendering pipeline and hardware setup. For my case it was not a single thing:
First thing I did was shift anything that could be easily handled by FFmpeg (like zooms, pans, simple grid animations) directly into FFmpeg.
Make sure disk I/O is not a bottleneck. Using a RAM disk gave me big speedups:
~4Ă— faster than a slow EBS volume ~1.5Ă— faster than an NVMe SSD
Quick and easy to set up — just make sure your Remotion temp folder is on the RAM disk:
export TMPDIR=/mnt/tmp
TMPDIR=/mnt/tmp bun render.mjs
One weird bottleneck: CPU usage. For example, a composition takes:
80s on a Mac M4 Pro: ~70% CPU utilization 150s on an EC2 instance with 48 cores: ~10% utilization :-(, both with full concurrency
As @JonnyBurger pointed out, try using multiple Chrome instances with the openBrowser API to avoid single-process bottlenecks.
I didn’t use openBrowser directly, but already splitting the render into chunks like this gave me much better performance:
const numChunks = 12;
const renderPromises = [];
for (let i = 0; i < numChunks; i++) {
const startFrame = i * chunkSize;
const endFrame = Math.min((i + 1) * chunkSize - 1, durationInFrames - 1);
const chunkOutputPath = `${tempDir}/chunk_${i}.mp4`;
console.log(`Rendering chunk ${i+1}/${numChunks}: frames ${startFrame} to ${endFrame}`);
renderPromises.push(
renderMedia({
composition,
serveUrl: bundleLocation,
codec: 'h264',
chromiumOptions: {
enableMultiProcessOnLinux: true,
disableWebSecurity: true,
ignoreCertificateErrors: true,
},
outputLocation: chunkOutputPath,
inputProps,
concurrency: 3,
frameRange: [startFrame, endFrame],
})
);
}
await Promise.all(renderPromises);
This brought render time down from 150s → 30s and scales better with CPU count.
Small caveat: Using numChunks >= 16 crashes Remotion with an error like "no ports found for static files." Still figuring that part out.
Also @tzvc, I saw you already tried the same CPU optimizations — either the new Remotion version handled it better, or you were hitting a disk I/O bottleneck.
Hi guys, started to use remotion in "wrong" ways too, for splitting and concatenating big and long mp4 files, and getting poor performance.
In my case, as @algoholger mentioned, IO is a huge bottleneck. Because each video frame is extracted and written one by one, it causes a LOT of writing activity. I run things on our own baremetal servers, going from SSD to NVMe-backed storage made a big improvement, but even better is mounting /tmp in a RAM drive. Not to mention saving the wear on the drives.
But it's still diminishing returns... @algoholger would you mind sharing some more details about how you stitch back together the chunks?
@eTiMaGo something like this, no warranty. Example is stripped down
import { renderMedia, combineChunks } from '@remotion/renderer';
.
.
.
// Assuming composition, bundleLocation and ev. inputProps are set
const OUT_DIR = 'out/out.mp4';
const TMP_DIR = 'out/chunks';
const FPS = 30;
const DURATION_IN_FRAMES = 1234; // Duration of composition
const NUM_CHUNKS = 6; // Define number of chunks to split the video into
const CONCURRENCY_PER_CHUNK = 2;
const chunkSize = Math.ceil(DURATION_IN_FRAMES / NUM_CHUNKS);
// Render chunks
await Promise.all(
Array.from({length: NUM_CHUNKS}, (_, i) => {
const start = i * chunkSize;
const end = Math.min((i + 1) * chunkSize - 1, DURATION_IN_FRAMES - 1);
if (start >= durationInFrames || start > end) {
return null;
}
return renderMedia({
serveUrl: bundleLocation,
composition,
codec: 'h264-ts',
audioCodec: 'aac',
forSeamlessAacConcatenation: true,
enforceAudioTrack: true,
compositionStart: 0,
frameRange: [start, end],
outputLocation: `${TMP_DIR}/v_${i}.ts`,
separateAudioTo: `${TMP_DIR}/a_${i}.aac`,
inputProps,
concurrency: CONCURRENCY_PER_CHUNK,
// onProgress: (optional)
// port: (optional)
});
})
);
// Create concatenation file for FFmpeg
const videoFiles = Array.from({length: NUM_CHUNKS}, (_, i) =>
path.resolve(`${TMP_DIR}/v_${i}.ts`)
);
const audioFiles = Array.from({length: NUM_CHUNKS}, (_, i) =>
path.resolve(`${TMP_DIR}/a_${i}.aac`)
);
await combineChunks({
outputLocation: path.resolve(OUT_DIR),
videoFiles,
audioFiles,
codec: 'h264',
audioCodec: 'aac',
FPS,
framesPerChunk: chunkSize,
compositionDurationInFrames: DURATION_IN_FRAMES,
preferLossless: false,
});
thanks! been experimenting a bit already, musy better results :)
I am looking for feedback for our new Video tag! https://www.remotion.dev/docs/media/video
Since the old one was bottlenecked and shared across instances, I hope this one will alleviate the problems you guys had before.
From what I’ve seen so far, that’s a seriously impressive Video tag. In the examples I’ve tested, I got up to a 40% performance increase when rendering sequential videos (one after another with some animations) compared to using OffthreadVideo, and up to a 25% improvement over rendering image sequences made with ffmpeg (including prep time).
I still want to check performance on EC2 servers without a GPU — I tested on a Mac and didn’t see any GPU spikes, so I’m guessing it’ll be fine. I’m also curious how it handles rendering multiple videos at once, with and without up/downscaling.
I did notice some weird behavior when using the new Video tag together with image sequences — it works fine with OffthreadVideo, but it messes up the frame order in image sequences. Could just be something with my composition/render setup though.
Other than that, I haven’t seen any bugs. Seems pretty close to production ready. Looking forward — thanks!
@algoholger Nice, thanks for testing it!
Can you open a separate issue for it, and describe it in more detail? I cannot understand how to fix it.
About the GPU, this might be expected - the software video decoders are also quite fast, OffthreadVideo had different bottlenecks than not being able to use the GPU actually.