[Feature] Headless rendering
Context: Currently looking into using Viser as a visualizer for visualizing Guassian Splats along with other 3D objects such as meshes and pointclouds and is working quite well. In some applications, it would be useful to simulate such data and perform rendering without having to go to the WebGUI.
Request: A way to easily perform headless rendering without needing to open up WebGUI, currently looking into if simulating connection through Python works just to access the "camera feed" but would be nice if it was a part of the API. (Accessing camera handle requires a connected client, which I currently only achieve by opening the WebGUI.
Example of application: Mount a camera to a trajectory and then render the camera view through this trajectory and export a video/images of results. In most of these cases, it is not desired to have the requirement of opening a client (clicking on the host link) to be able to render this.
Hi @Crezle, would love an update if you figure this out! playwright might be a good thing to look into if you haven't already.
I currently have a similar requirement. Given the overhead of having a browser open or in headless, my current workaround is to use gsplats rasterization. But its limited to rendering only the splat. So any meshes that were loaded into viser cannot be rendered currently since it lacks the idea of a scene.
Also, the camera position and orientations differ from viser, and produces a slightly different image.
It would be nice to bring gsplat natively to viser to enable a headless render mode.
@akhilsathuluri I am very curious about the quality of the code generated by your implementation. I have tried other methods without using Viser and found that none of them are as authentic as Viser. If you could share that, I would be very grateful.
Unfortunately Im not working on this anymore at the moment. But here is a snippet I tested: https://github.com/akhilsathuluri/sim_a_splat/blob/headless/test_gsplat_render.py. You should be able to reproduce this with your custom splat.
From what I remember the quality was not super good, but it was sufficient for my application.
Made the following code snippet that allows you to run viser in headless mode, using playwright to create a headless browser. This allows you to run get_render() on a ClientHandle to generate an image, preserving all viser functionalities in rendering (such as combining splats and mesh models). Either run the script as an separate python instance, or integrate it within your systems.
import time
from playwright.sync_api import sync_playwright
VISER_URL = "http://localhost:8090" # change if your client is served at a different URL
RETRY_OPEN_SECONDS = 30 # how long to wait for the client to initialize
KEEP_ALIVE_SECONDS = 1_000_000 # basically keep running
LAUNCH_ARGS = [
'--ignore-gpu-blocklist',
'--use-gl=angle',
"--enable-webgl",
]
def main(url=VISER_URL):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=LAUNCH_ARGS)
context = browser.new_context(
viewport={"width": 512, "height": 512}, # adjust to your camera size
device_scale_factor=1,
)
page = context.new_page()
print(f"[viser-headless] Opening {url} ...")
page.goto(url, wait_until="networkidle")
try:
print("[viser-headless] Waiting for canvas element...")
page.wait_for_selector("canvas", timeout=RETRY_OPEN_SECONDS * 1000)
print("[viser-headless] Canvas found.")
except Exception as e:
print("[viser-headless] Timeout waiting for canvas:", e)
# still continue — some clients may not put a canvas in DOM immediately
# Keep the page alive so it acts like a real client (it will run the client-side JS that streams frames)
print("[viser-headless] Headless client should now be connected to the Viser server.")
print("[viser-headless] Press Ctrl+C to exit.")
try:
# You can optionally poll some state in the page and print it for debugging
while True:
# debug: check number of canvases and URL
canvases = page.query_selector_all("canvas")
print(f"[viser-headless] canvases={len(canvases)} url={page.url}")
time.sleep(5)
except KeyboardInterrupt:
print("[viser-headless] Shutting down by user request.")
finally:
browser.close()
if __name__ == "__main__":
main()
There are however some performance issues with this approach contrary to using gsplat directly as suggested above:
- First of is the setup of playwright. The launch arg "--use-gl=angle" only runs GPU software rendering (OpenGL in this case) it seems, effectively bottle necking the performance versus running a normal browser. The magnitude of the slowdown is about 10x. Its however 10x faster than using '--use-gl=SwiftShader' or '--use-gl=llvmpipe' that uses CPU rendering. Its however still nice to have the option of rendering viser on CPU, e.g for CI or cost versus GPU cloud facilities. I also tried "--use-gl=egl" that should allow for hardware accelerated GPU rendering, but it seems that this feature is currently a bit broken in playwright's chromium browser. Might be a way to run it in Vulkan though, or make use of a different browser such as firefox.
- Secondly, I'm not sure how efficient get_render() is. Without knowing the internal code of Viser, it seems to me to be an API call relying on network communication (message passing) between the server and client. For large images or applications requiring closer to real-time performance, it could become overloaded, especially if its not video encoded. Since this approach effectively allows the server and client to be in the same python instance, i tried using screenshot() in playwright to bypass the Viser rendering all together. This about 30x the performance of get_render() while still running with"--use-gl=angle". However it seemed to only work in the very first frames before the browser halted. Not sure of why, so any help / suggestions on what could be the issue would be appreciated. It looks like the halting begins after the scene is done loading on the client.
- Thirdly, this method relies on sending image data to the CPU. The gsplat method should be more efficient if you are going to continue to do computational work on the image at the GPU. Since you could fetch the image data directly from the GPU buffer by using gsplat, tasks like machine learning would become more efficient. I think there might be a way around this though using VirtualGL instead of OpenGL on the browser side to split the frame-buffer, but its way out of my league.