xdg-desktop-portal
xdg-desktop-portal copied to clipboard
[Feature Request] Screencast: Allow some way to request windows by name or process
I'm working on a system that requires the ability for OBS to, without user intervention, capture the contents of a new window. At the moment xdg-desktop-portal cannot support this; it allows selecting a window with user intervention, or it allows selecting a window with an opaque recovery token. But the recovery token is generated only with user intervention, so there's essentially no solution to allow screencasting a window without going through the window picker.
This is something that you can do on Windows because OBS identifies windows by name, and it's sometimes really convenient because it dramatically reduces the amount of user intervention needed when you do things.
I'd like an option that can be passed in which is a desired window identifier, either identifying via the name of the window or via the name of the process. Using the game Hades as an example because I have it convenient, this would look something like, as an added parameter to org.freedesktop.portal.ScreenCast's SelectSources:
request_by_name Hades
or
request_by_process Z:\home\zorba\.local\share\Steam\steamapps\common\Hades\x64Vk\Hades.exe
If the request fails, it could either fall back on the picker, or just return failure. I'm not sure which of those is better.
Concerns:
There are definite potential security issues in this. I'm not proposing that you should be able to get any window just by guessing the window name or process name; instead, I'd personally expect a checkbox on the picker labeled something like "allow future capturing of windows with this name", which saves that flag permanently somewhere. This means you still need to manually intervene once, but after that, it can just happen transparently for you.
I'm kinda handwaving on "saves that flag permanently somewhere". I think if this were to be done completely right, this would also need a dialog somewhere so you could manage the authorized names/processes and revoke them. This may end up being complicated. A hacky initial option could forego the window checkbox and just allow a hand-authored config file somewhere; this works for my purposes and might work for ironing out interface problems.
There's potential ambiguity if there's more than one window that matches the pattern. In my case, I don't care! Just pick one! Maybe other people care.
In my case, I have control over the window, so I could send a D-Bus message that says "identify this window with this given tag", then we could have request_by_tag. I don't think this is a good solution, though, because most people who want this feature are not going to have code-level control over the window.
I'd love to get this up to feature-parity with Windows; right now it errs on the side of security, which is a good direction to err in, but, man, sometimes convenience and automation are really nice!
This feels like a very big security issue, as you mentioned, as well as usability issue. You can already continue the capture of a previously selected window if the user allows it - what more do you need? You can also use things like ObsVkCapture for games, which does work with Flatpak.
Not to mention that requesting by name or process is iffy - the window names are subject to change, and the process can be different depending on the environment you're run in; the process in a flatpak isn't necessarily the same as the process running on the host, as well as file path issues.
You can also probably figure something out with window handles, and the same mechanism that ObsVkCapture uses.
Would it help if you could pass a window title/app-id to as a filter, then still require the user to click "Share"? It'd simplify the user interaction by potentially having a single window to choose.
This feels like a very big security issue, as you mentioned, as well as usability issue.
Note that I'm not asking for this to be enabled by default, I just want another checkbox to loosen the security a bit further. This is one of those cases where security and usability clash a bit.
This feels like a very big security issue, as you mentioned, as well as usability issue. You can already continue the capture of a previously selected window if the user allows it - what more do you need?
The problem is that it's a new window spawned by a new process (with the same name, and with the same process name, but still a new PID.) This makes it impossible to "continue" the capture; it needs to be a new capture of a new window that has many of the same properties as the last window.
You can also use things like ObsVkCapture for games, which does work with Flatpak.
This might work; I'll check it out.
Not to mention that requesting by name or process is iffy - the window names are subject to change, and the process can be different depending on the environment you're run in; the process in a flatpak isn't necessarily the same as the process running on the host, as well as file path issues.
In my case, I have control over the window name, and it's not running in a flatpak anyway. I agree this might be something that needs to be tackled for general purposes though.
(Although I will note that "capture based on window name" has been an identifier used in OBS for quite a while.)
Would it help if you could pass a window title/app-id to as a filter, then still require the user to click "Share"?
Unfortunately not. Needs to be fully without interaction.
In response to the confused emoji:
The thing I'm working on is an automated test framework for a game. I need to be able to spawn new fresh instances of the game running test scripts while automatically recording footage. "Automatically" is the entire point here; needing a human to sit there clicking the "share" button every fifteen seconds is unacceptable.
Right now, I'm solving this by running under X11 with XCompositor capturing. I tried switching to Wayland, but as near as I can tell all captures in Wayland must go through xdg-desktop-portal. There's no way to tell xdg-desktop-portal "no, seriously, let me capture this window without user intervention", so this makes Wayland completely unusable (regardless of whether a flatpak is involved, for the record.) I'd like to head this problem off sooner rather than later.
If I do need to switch to Wayland, my current solution is going to be a custom build of xdg-desktop-portal that implements "search by window name" and a matching custom build of OBS to pass chosen window names in, because I frankly don't care about the security implications in this context. But it'd be nice to come up with a solution that other people can use as well :)
Then your best bet will be OBS-VkCapture. Unless there's a more real-world use case for screen capturing via the ScreenCast API with specific window names/etc, it's not really useful. Test frameworks don't usually need to, nor do they normally go through, desktop APIs like this.
There is request #304, where the window name is needed (for display within app). If such a portal existed, then what is requested here would be an extension of it by asking to monitor a specific name or selecting a specific application. Am I right?
That sounds about right.
This issue requires careful consideration as it poses a significant challenge for serious streamers transitioning from Windows to Linux. There seems to be a misunderstanding regarding the complexity of streamer setups, where multiple sources need to be added as overlays to create engaging content. While selecting sources for a camera and a game is straightforward, the process becomes frustrating beyond that.
Modern streamers incorporate 3D avatars, multiple webcams capturing different angles of their keyboard, avatar, and face, along with various web browser plugins as additional sources. Currently, upon opening OBS, users are inundated with countless requests to select window sources, causing confusion about which window to choose. This cumbersome process creates significant barriers for streamers considering Linux as a viable platform.
Currently, upon opening OBS, users are inundated with countless requests to select window sources, causing confusion about which window to choose. This cumbersome process creates significant barriers for streamers considering Linux as a viable platform.
Restoration is already a thing, and is implemented on obs (you need your portal backend to support ScreenCast v4):
restore_token(s)The token to restore a previous session.
If the stored session cannot be restored, this value is ignored and the user will be prompted normally. This may happen when, for example, the session contains a monitor or a window that is not available anymore, or when the stored permissions are withdrawn.
The restore token is invalidated after using it once. To restore the same session again, use the new restore token sent in response to starting this session.
Setting a restore_token is only allowed for screen cast sessions. Persistent remote desktop screen cast sessions can only be handled via the Remote Desktop interface.
This option was added in version 4 of this interface.
Although it seems like it doesn't work across captured applications restarts and compositor restarts, I think it's already tracked in #1355
@ruineka: Maybe open a discussion (https://github.com/flatpak/xdg-desktop-portal/discussions/new/choose) to document what streamers exactly need and expect?
I'm in the same situation as @ruineka here with regard to streaming. OBS currently pops four screencast selector windows every time I start it up, and since it doesn't identify which source each one is supposed to go to I have to close out of all of them and then manually re-assign every capture source. It's a real pain in the neck, and I don't think I have a particularly complicated stream layout. Session restore doesn't work since it's not practical for me to keep the windows I want to include in the stream open all the time (and in some cases I have to close and reopen them while the stream is running, which forces another screencast select since the window ID changes).
In my opinion, this is a real-world use case that really does warrant a flag or permissions bit or something for unattended screencasting. It's a security issue if we let any program do that, but at some point I don't need to know every time if OBS is going to capture my window contents and send them off to some random external server because that is the thing that OBS is designed to do. We don't have to blow the thing wide open, but I'd at least like to be able to say "/usr/bin/obs can watch /usr/bin/dolphin-emu without asking me first" because that is often my explicit purpose in opening OBS and I don't really care that much if some other program could hypothetically pretend to be OBS and capture from my dolphin-emu window.
(Also, the current model doesn't even prevent you from streaming the wrong content by mistake. If you're streaming a capture of a browser window (as I do for LiveSplit One), and something xdg-opens a link and opens a new tab in that window -- which can happen without user interaction -- the capture keeps on rolling...)
As this is the only active discussion I can find on this topic I want to add my two cents to this in regards to the problem with this kind of system in relation to streaming:
It isn't uncommon for more established stream setups to utilize multiple scenes with multiple captures inside of them. I would casually stream from time to time while on Windows and my setup involved having a scene dedicated to every program/game I would interact with on stream while utilizing a plugin like advanced scene switcher to detect which ones had focus and automatically switch between them. This not only allowed me to hands-free move between scenes but also allowed me to easily setup any custom layouts I might need for specific program/games (for example different camera positions to avoid covering in-game content).
The main benefit doing this setup provides is helping to avoid accidentally sharing unwanted information on stream, which a full display capture or manually selecting windows while live presents the risk of. However, due to the way the window capture currently functions, of needing to manually select each source on boot, creating these kinds of setups with any more then one capture becomes far too tedious to the point of being unreasonable. Even if you can get the capture to remember all of your windows on boot (which I haven't been able to get working but know others who have) not having these programs open while launching obs will break the capture in question requiring you to open every single program/game you have a scene setup for every time you boot which just isn't feasible.
This is something that I think needs to be looked into if linux is to be considered a viable option for streaming. Currently from my point of view it's not unless you're doing an extremely limited setup.
For those who find this thread and are looking for potential a work around:
1 - Just use a full display capture and do a scuffed single scene setup while trying work around the risks associated with that.
2 - Limit the amount of window captures to ideally 1-2 and set them manually to the required window on boot. If you need multiple scene setups look into placing the captures inside of a parent scene that you can move into other scenes utilizing the "Scene" source. However this comes at the cost of needing to move to the parent scene, change the target window, and then move back to the new desired scene making all transitions manual and very messy.
3 - Move over to x11 every time you need to stream and utilize Window Capture(Xcomposite) which works more similar to Window's version and doesn't have any of the issues stated above, although comes with its own problems. This also may no longer be an option for some as major distros and desktop environments will soon be removing x11 entirely.
Any problems with session restoration should be resolved in the way session restoration is handled. If apps need to be relaunched, this should be managed in the session restoration; no need to give OBS the ability to launch apps for this reason.
For privacy in relation to specific windows, it is best not to give them in the first place (if you do not need them). I wonder if it is possible for Pipewire and Wayland to create a stream for the entire desktop without specific windows...
To change scene depending on which app has the focus, it may be reasonable to let OBS know which app has the focus if it can already see the apps.
For privacy in relation to specific windows, it is best not to give them in the first place (if you do not need them). I wonder if it is possible for Pipewire and Wayland to create a stream for the entire desktop without specific windows...
This is already a thing in niri, it has block-out-from "screencast" window rule, and it also has a dynamic screencast target that let's you change the casted screen/window with a hotkey which can also help avoid sharing unwanted things especially the window picker (it doesn't really help with scenes though), see https://github.com/YaLTeR/niri/wiki/Screencasting
Thanks, good to know. As is, I would say that information about which windows to not record (or only those that are to be recorded) should be included in the portal for restoration (this would accompany the option to record the full desktop). Correct?
3 - Move over to x11 every time you need to stream and utilize Window Capture(Xcomposite) which works more similar to Window's version and doesn't have any of the issues stated above, although comes with its own problems. This also may no longer be an option for some as major distros and desktop environments will soon be removing x11 entirely.
This is actually the only reason why I'm using x11 on my pc. My stream setup is composed of a bunch of scenes in which for each one of them I'm doing a certain thing. So, for example, I have a "Programming Scene" where I have all of the overlays as a Sub Scene and the window capture (XComposite) of the code editor I'm using. This also helps me with changing between scenes because I just had to adjust the Automatic Scene Switcher for the window names and voilá!
It works. Maybe not the best setup, but it gets the job done without leaking all of my credentials and having to worry about changing sources or scenes.
Using wayland and, by consequence, the portals, not only I lose the access to the Automatic Scene Switcher, but I also get the annoying popups for "Choose a window" just because I want to open OBS and apps such as vTuber Kit and the rest of the programs for the different scenes are not open.
For me, the ideal world would be setting this stuff once in OBS with a checkbox labeled "Remember always" and then just not worrying about the popup appearing again, but maybe that could cause issues with people forgetting about it or just clicking it accidentally, so I'm not really going to comment what should be the way to go.
Sorry if it sounds like a rant, it just took a long time to discover that the issue was portal related and not specifically wayland and it annoyed me for a REALLY long time to discover this github issue (Thanks, Google >:( )