rem icon indicating copy to clipboard operation
rem copied to clipboard

window retrieval for application identification is using `isActive` which can be multiple windows, the incorrect field to use

Open roblg opened this issue 2 years ago • 7 comments

I've been poking at the screenshot capture with the goal of trying to be able to filter to only the foreground application, and I noticed that what's being captured even on main seems to not match my expectations of what should be being captured based on the assumed intent of the code.

In particular, https://github.com/jasonjmcghee/rem/blob/d34aa69b1e094f757efcf6c5d93d0014a96240da/rem/remApp.swift#L265

looks like it's attempting to find the active window, where "active" is probably supposed to be "the one on top"(?). But there are actually a lot of windows that have isActive = true when I'm running.

Adding the following debugging code to that method:

            shareableContent.windows.forEach() {
                if $0.isActive {
                    print("Active window: \($0.title)")
                }
            }

results in basically every open application on my machine, plus some Desktop-y and menubar-y looking things:

active window: Optional("Offscreen Wallpaper Window")
active window: Optional("Wallpaper-")
active window: Optional("Item-0")
active window: Optional("Recents")
active window: Optional("rem — remApp.swift")
active window: Optional("Item-0")
active window: Optional("Screenshot 2023-12-28 at 7.16.08 PM")
... snip ...

I spent some time looking at the various ScreenCaptureKit APIs, including SCContentFilter, and SCScreenshotManager, but haven't figured out how one would go about getting the actual highlighted/selected window. Just wanted to flag this in case someone driving by knew the answer. :)

roblg avatar Dec 29 '23 16:12 roblg

Thank you! That's incredibly useful information.

I'm clearly doing this 100% wrong! I will look into this as well.

Right now rem is only using it to get the "application name". It's not doing any filtering / only capturing from a specific thing.

jasonjmcghee avatar Dec 29 '23 16:12 jasonjmcghee

To speak to the discussion around ScreenCaptureKit:

I ended up using:

guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }

Instead of doing a capture stream.

So, is this the right move, instead of their more recent APIs?

Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.

Now what it doesn't provide is a way to hide specific windows, like a private one.

That's a great feature to add at some point - a user setting: "don't record private browser windows" or something, but for now, this felt like a good approach.

jasonjmcghee avatar Dec 29 '23 17:12 jasonjmcghee

Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.

Fair point.

I haven't gotten to digging into the OCR piece yet. Does it differentiate between different sections of layout on the screen? (Amazon Textract can do this, and give you bounding boxes). If not, I'm sort of torn on whether it wouldn't make more sense to only record the focused window (or at least figure out how to annotate text extracted from the focused window), so that it's more clear what activity the user is performing, rather than pulling text out of background windows, etc.

Also the ability to avoid capturing certain windows seems really important -- things like 1Password, Messages, and even my terminal in some cases I think would be desirable to exclude. But on the flip side, providing that functionality might spike adoption and result in a lot more people doing drive-by feature requests, rather than plinking in the code :)

roblg avatar Dec 29 '23 18:12 roblg

Ha. It's like you're looking into my soul. This grew much faster than I expected and it's still very unstable. Going for feature complete before basic code organization, decoupling, tests etc seems ripe for trouble.

That being said, having an option in settings that isolates the new behavior feels like a good way to counteract that a bit.

It's using Apples latest OCR stuff- ImageAnalysis. It definitely does some structured recognition. Like it puts line numbers in one column, text in another, and settings in another etc.

Definitely requires experimentation.

I would love to encourage you to communicate about your findings and consider helping out! Window blacklisting is a great feature if we can maintain core functionality!

jasonjmcghee avatar Dec 29 '23 18:12 jasonjmcghee

Thought about this more - you could likely do "per-window" OCR as desired. like, get the frame and only do ocr within that frame, for each desired window.

jasonjmcghee avatar Dec 30 '23 04:12 jasonjmcghee

I wonder about per-window OCR -- I'm curious how that would interact w/ overlapping windows, particularly in the simple implementation where there's just a CGImage generated over a rectangle. Also curious about how SCKit handles "excluded" windows. If there's an excluded window on top of a not-excluded window, is the not-excluded window still completely rendered? Definitely seems like experimentation is in order.

roblg avatar Dec 30 '23 21:12 roblg

I think if you properly use SCContentStream, you can record each window individually, even if occluded. But not positive.

And in the case of filtering an occluding window, it'll be like it's not there at all

jasonjmcghee avatar Dec 30 '23 21:12 jasonjmcghee

I just added a PR which probably fixes this. I'm using NSWorkspace to get the frontmost application and use that to determine the application name.

This also opens up further possibilities, e.g. screenshotting only active windows of the frontmost application, or determining the display the current active window of the frontmost application is etc.

seletz avatar Dec 31 '23 11:12 seletz

FWIW I added issue #38 to add support for OCRing only the frontmost application's active window(s). I think PR #37 does fix the application name issue. I also think that using that route we'll able to support multi-display setups.

I have a very wide display at work, and I think capturing the whole screen might cause some performance issues. Maybe we can optionally even limit screenshotting -- although multi-window applications are a thing and I don't know how this would work. Also, when doing a MP4 this would cause issues as windows would generally differ in size. We could compose a artificial image using the frame of the window and the display, of course ...

seletz avatar Dec 31 '23 13:12 seletz

@seletz thanks for that change! definitey is the right implementation today IMO.

I'm thinking that, should we go the route of allowing certain windows / applications to be filtered (SCFilter / SCContentStream stuff), and someone filters out an application, that, even if it's the front-most application, it shouldn't show up.

Which means we may not be able to use NSWorkspace.shared.frontmostApplication?.localizedName once we do that, as it would reveal the user had been using one of the filtered applications.

jasonjmcghee avatar Dec 31 '23 15:12 jasonjmcghee

@jasonjmcghee Uh ok, you're right. Need to look into the screen capture framework and do some rtfm I guess 🤩

seletz avatar Dec 31 '23 17:12 seletz

Not super pressing in this moment as no one has built windows / application filtering yet- just trying to be transparent in my thought process here / set expectations for the future!

jasonjmcghee avatar Dec 31 '23 17:12 jasonjmcghee

I'm going to mark as "fixed" for now thanks to @seletz work. I think filtering is a feature outside of this issue

jasonjmcghee avatar Jan 02 '24 01:01 jasonjmcghee