The Problem

One of the larger time-sinks today is video. Be that through streaming services like Netflix, Amazon Prime Video, or HBO. Media sites like YouTube, Twitch, or Vimeo. Or from downloaded or streamed media on players like VLC or MPV.

Activitywatch, unfortunately, fails to effectively record the time spent on these activities as it relies on mouse and keyboard input to determine activity. This means when watching a video, Activitywatch will mark the time as afk after a short while. Even though the user is present, and spending time on an activity at their device.

This was brought up in https://github.com/ActivityWatch/activitywatch/issues/186 which was marked as wontfix. I believe that this something that CAN be solved, and should be seriously considered given the amount of time that can be spent consuming media.

Afk time can be disabled, but this then pollutes the data. Users may go afk for a variety of times in a variety of applications or websites throughout the day, which could pollute the afk time to the point of video-specific time may no longer be useful.

Possible solutions

Note: Not all problems/disadvantages/pitfalls are meant to be solvable. I am including them for devils advocates sake and to foster a more robust discussion.

Application/Site Tagging

Compile a list of common media applications and websites. When the use goes afk on this site or application, mark the time as non-afk. This isn't technically tagging, but it could be setup to work in a tag-like way, which would make this into a very extendable solution for more than just videos.

Advantages

Easy to maintain
- Add sites/applications as you see fit, or as the community points them out
Simple

Disadvantages/Pitfalls

Initial list will be a chore to create
- Can always ask the community for assistance. You create the functionality, and get the community to fill it in with you curating.
Is not quite perfect.
What if the user is legitimately afk on a youtube page?
What if the user is afk on the homepage of a media site, and not on an actual video?
What if the user is afk on a video, and it's paused?

Enhancements

These are here to try and solve for some of the problems presented under disadvantages.

User-defined or user changeable lists and filters
- The repo still maintains a list, but users have the ability to add items to theirs without waiting for a pull request to merge and a new release.
- There are a variety of flavorful solutions to this that I will not go into
Combine with monitoring hardware to detect if a video might be paused

Monitoring Hardware

Monitor audio output to see if a video is playing.

Advantages

Simple. May not be easy to implement, but is a simple solution to the problem

Disadvantages/Pitfalls

What if the user is listening to music, like spotify, and is afk on a media player?
What if the user is playing music from a media site like YouTube?
Difficult to implement?
Monitoring multiple audio devices necessary?

Enhancements

Combine with site/application tagging

General Enhancements

These are enhancements that could apply to any solution, to increase accuracy and to enable the user to correct mismatches and errors.

User-Defined Lists & Filters

Let the user create/modify the list of sites/applications, and/or the patterns used to match them.

Takes burden off of developer to maintain accurate lists and patterns to meet everyone's specific needs
- I would still expect than an "official" or pre-defined list of items and patterns would be bundled or downloadable
Some passionate users can open issues to merge items and patterns back into the repo with the dev curating them, or just to share them.

Tagging and Pattern Matching

This goes above and beyond, but would really turn this into a much more powerful tool.

Instead of just solving the video problem. Create an extendable solution that encompass the general problem category that the video problem is part of. This would be in the form of tagging, being able to automatically tag domains & applications with predefined or user defined tags. This can be facilitated with pattern matching lists, and depending on the data's schema/format could be applied to already existing data greatly enhancing it's utility.

As an example, time in VLC, YouTube, or Netflix could be tagged as video, which gives users the power to filter this time separately, combine it in reports, or to more easily correct collection errors.

This of course could be setup to be user-manageable, with points listed in the previous section.

Conclusion

I believe the ability to capture time spent consuming video-based media will have an impact on the future usability of this project as these sorts of services continue to expand and bring in more and more people. Solving this problem can not only provide a solution to this problem, but could also greatly enhance the utility and power of this application.

What are your thoughts? (please to not be automarking this as closed, this took some time and effort to create).

Jan 07 '19 07:01 douglasg14b

Thanks for taking the time to write all that, greatly appreciated!

I agree, and there is actually unmerged code that solves the issue for web-based media players through the use of the audible property on tabs: https://github.com/ActivityWatch/aw-webui/pull/85

I'm personally pretty happy with the above solution, as it creates minimal complexity and works for the (what I suspect is) the most common ways people consume video on their computers today (YouTube, Netflix, other web-based players).

I'd love to give more thorough feedback on the options you mentioned, especially tagging, but I'm really busy with exams this week so it'll have to wait. In the meanwhile, check out the discussion in #95

Jan 07 '19 07:01 ErikBjare

This is an idea aimed towards video-playing apps, which is a big part of consuming media.

Make a separate watcher for the media. The watcher could be either just for the PC media software or for both(it would get the audible property from the web watchers).

Have a white-list of apps, check if they're on the screen. To count time on the media-watcher, you just count the time the app is on display. We have the same downside that @douglasg14b mentioned, i.e What if user really pauses the app and leaves whilst app is on the screen?

The solution:

Check if the computer is asleep or not. On both, Linux, Mac and Windows, the computer will usually not go to sleep if there is media(This would work for video, not sure about purely audio) playing. One thing to further investigate is whether the media software needs to be in full screen and playing for the computer not to go to sleep. Therefore, if there is a media app on the screen, and the computer is not asleep, we would log that as playing time for the app.

Following false positives would occur:

App is paused but still on the screen whilst user is not afk would be logged as active.(this will happen rarely happen, as users tend not to fill screen real-estate with a video app if it is not being used).
Computer never goes to sleep(this is also a fairly rare case), so media time would be logged even if video is paused.
Accuracy also depends on the time it takes for the OS to go to sleep following user inactivity, including media.

Jan 09 '19 20:01 nicolae-stroncea

@nicolae-stroncea

Detecting active audio alongside a list of sites/apps would bring the accuracy up to a very acceptable level in my opinion. Either of them by themselves would be too riddled with false positives to be too terribly useful. There isn't much need to go fancier than that imho. This is something I went into in the initial post.

Active audio + afk + on youtube = watching video. It's not perfect, but much better than on youtube = watching video. As an example, I have 4 monitors, and I almost always have something playing when working, and will often click on that to pause it then leave for a while. Leaving myself afk with an active video player that isn't playing. Or even watching Netflix, pause it and leave for a while, it's the active window, but nothing is playing.

Letting users create their own pattern matching for sites will also bring up the accuracy as it lets them add sites/applications to the list of video apps/sites.

Jan 09 '19 22:01 douglasg14b

@douglasg14b

I agree that detecting audio would be the most accurate way of doing it. As you've stated, it is a complex solution. The solution I offered was meant as a less complex alternative (i think it's fair to say it would be easier to implement with the existing code, and might have a lot fewer edge cases than if we go into monitoring hardware), but at the sake of less accuracy. It ultimately depends on the amount of effort that will be put into the feature. If monitoring hardware to detect audio successfully will take too many man-hours to implement at the time being, I think the solution I suggested is a feasible proposal.

I disagree that it would be so riddled with false positives to not be useful. youtube is an active window && computer not on standby = watching video would be the more accurate representation. For your example, where you would watch Netflix, pause and leave for a while. Presumably, the computer would go on stand-by, at which point, the Netflix app would not be counted anymore for active media time.

Jan 09 '19 22:01 nicolae-stroncea

I believe that relying on standby is an incorrect assumption for users of this library. How many people's devices that are not laptops go into standby within a few minutes of it being idle? Even plugged in laptops default to 30-60mins on Windows 10 in balanced mode, what about high performance mode? On desktops?. You're looking at 30m - 4h IF standby is even enabled, and their devices don't just turn off the screens and stay on. Nevermind most Linux users who probably don't use standby at all from what I've seen as it's usually not on by default for most desktop installs of common flavors

That's a lot of invalid data. If you're watching a movie, and you step away to do something (bathroom, cleaning, walk to dog, cooking, make coffee...etc) are most people actually gone long enough for their device to go into standby (60mins)?

That would also rely VERY heavily on the end users setup, which can vary wildly, especially when assuming that their power configuration is not set as the defaults. Assumptions on user device configuration shouldn't be made unless there is data to back it up. Which is why I believe it will be more inaccurate, and potentially worse than just not recording it at all as it currently does.

Thankfully this is a FOSS project, so man-hours isn't as much of a concern as if this was an in-company product with expenses and wages to worry about. It's still relevant, but at least in my projects, I don't consider time to implement as a deciding factor for features or compatibility unless a solid and usable drop-in is available.

Jan 09 '19 23:01 douglasg14b

Has anyone investigated integrating with media players through the same mechanisms as last.fm/audio scrobblers?

Jun 24 '19 20:06 dynamiclover

Another idea would be to take a screenshot and if the screen has changed, consider it active (not afk).

Aug 21 '19 06:08 jtrakk

Has anyone investigated integrating with media players through the same mechanisms as last.fm/audio scrobblers?

@dynamiclover We have aw-watcher-spotify as an experiment https://github.com/activitywatch/aw-watcher-spotify

Another idea would be to take a screenshot and if the screen has changed, consider it active (not afk).

@jtrakk Two issues with this

Most users have a clock in their taskbar on their computer, this will change at least every minute. Then you you could argue that you need a minimum amount of pixels to change, but that still becomes inaccurate as there might still simply be a website with an ad which has an animation which makes it think that you are not afk.
Comparing two pictures pixel by pixel is surprisingly slow nowadays due to how high resolution screens are nowadays. We don't want activitywatch to slow down peoples computers significantly (not by default at least, opt in could be an option)

Aug 21 '19 06:08 johan-bjareholt

@johan-bjareholt

I think that actually might be a viable solution, the issues you mentioned are solvable. I wouldn't take it off the table just yet. It's also very simple, and doesn't have a lot of complexities compared to monitoring audio.

You can probably even use something like OpenCV for this, which has a lot of utilities that make this even simpler like absdiff.

Down-scale the image, which actually does two things

Reduces the pixel count, say to 50k pixels
Removes and reduces small and inconsequential changes (Like a system clock).

Perform cheap math to check the delta from one image to another

Sum the squared differences of the pixel values
- You could also diff the lightness values, which might be even cheaper
Set a threshold to avoid false positives. If someone is watch a video, a LOT should be changing. So having a high threshold should be fine.

Don't monitor in real time.

Periodic image processing is damn cheap compared to anything in real time, and causes much less of a performance concern. It's trivial to do the above at 30fps even on old hardware. If you are checking for changes, say every 15 seconds, that's 0.2% of the processing power needed.
If the concern is performance spikes, then draw out the processing time. Instead of running the loop all in one go, let is sleep for a few microseconds every few loops. IDK how to do this in Python, but in C# it's fairly easy to avoid hogging system resources through asynchronous processing with artificial delays. I imagine Python can d the same.
- It's also fairly trivial, in my experience, to customize the delays based on prior performance of the device.

Also refer to https://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity

Aug 21 '19 16:08 douglasg14b

@douglasg14b I'd gladly help getting it to work with aw-server and the web-ui if you want to make such an watcher for ActivityWatch, we love to help anyone who wants to collect more data to activitywatch and make it possible for them to analyze it. You can even write the watcher in C# if that's the language you prefer, we have one watcher already which is written in that which you can get some inspiration from (https://github.com/LaggAt/ActivityWatchVS)

However I don't think this is something we want to ship with activitywatch by default and definitely not have turned on by default because:

It is not 100% accurate as we have discussed before, you can easily leave your computer with a movie running
OpenCV s a pretty big dependency, 26MB for the python bindings
Taking a screenshot is not universal on mac, windows and linux and needs to be implemented differently on each platform. Not impossible to do, just needs quite a bit of development and testing. Would probably also need more dependencies
I don't think performance will matter a great deal if written efficiently, but I don't believe that it ever will be negligible. If we have to go as far as to have to add artificial delays to avoid taking up a lot of CPU time I personally wouldn't use such an watcher on my laptops at least, probably not on my desktop either.

I personally don't want to spend time on this because I find there to be more important things to fix currently.

Aug 21 '19 20:08 johan-bjareholt

This seems doable with Pillow and pyscreenshot. Something like this might work, perhaps as a third-party watcher package.

import time

import pyscreenshot
import requests

BUCKET_URL = "http://localhost:5600/api/0/buckets/screenshot-rgb"
INTERVAL = 10

requests.post(BUCKET_URL)

while True:
    # Take a screenshot.
    im = pyscreenshot.grab(childprocess=False)
    # Get average value for each RGB channel.
    rgb = im.resize((1, 1)).getpixel((0, 0))
    # Post the rgb values.
    requests.post(BUCKET_URL + "/heartbeat", json={"rgb": list(rgb)})
    # Wait a few seconds before repeating.
    time.sleep(INTERVAL)

Aug 22 '19 07:08 jtrakk

@jtrakk Nice start, a few suggestions:

We probably should do the image analysis in the client rather than the server, then simply push the data {"afk": true/false}
Instead of doing a proper resize, only picking every fourth pixel or so and doing an average of those should make this faster and still be accurate enough
If you want you can use the aw-client python library instead of requests, easier to get things going and has some heartbeat optimizations

Aug 22 '19 09:08 johan-bjareholt

I would like to mention the power management tool powerdevil from KDE can detect whether there is a video playing. But I don't know how they achieve that.

Jan 21 '20 13:01 yujqiao

What about doing it inside of the extensions?

Proposal:

Read the DOM of a webpage, query for video elements. let listVideo = document.querySelectorAll('video')
In the current tab, check if video is playing by doing: if(!listVideo.paused){ videoPlaying = True }
Store 'activeVideo' property in datastr to true.
When checking afk status, if property exists and is true, consider user non-afk

Advantages:

Instantly crossplatform, don't need to worry about Mac/Linux/Windows
Should be a lot lighter on resources and storage than doing analysis on the screen. Essentially this would be querying a small list of elements(even youtube frontpage has only 1 video element in DOM, though not sure why), and checking a property of it
Should be easier to implement

Disadvantages:

Would only detect web-based content.
Automatically considers video playing as user being active. We're already considering doing this with all of the other approaches, so this one isn't really a disadvantage

I believe that since majority of media is consumed online, the advantages outweigh the disadvantages.

EDIT: further problem

EDIT 2:

A potential problem here is this would not detect content in iframes. A comparatively small (but existent) amount of media is done through iframes. Example is reuters (go watch an article, and it should pop an iframe with an embedded video). Another example is embedding youtube videos, which is also done through iframes. Maybe there are workarounds for this. The only one I found so far is checking for the 'autoplay` property, which if set, indicates video content. However, this is not foolproof.

On further analysis seems like the 'audible' property is indeed the better choice and given that it is the active tab and audio is playing it should indicate the user is watching some content

Jun 10 '20 01:06 nicolae-stroncea

I'm currently going through my AW database reviewing all events tagged as audible: true, and overall, all video content is tagged correctly:

Normal videos, streaming content, audio/video online calls, etc

There are a couple of false positives:

Music Websites
Background noise Websites
Podcast Websites
Radio Websites
Outlook(weird outlier. I think it might be notifications that triggered it?)

Since they are purely audio, it is very likely(more often than not, I would say) that a user puts on some music/podcast/radio etc, and then works outside of their computer: typing notes, cleaning, etc. So I think we could have a whitelist of these websites where we consider content as 'afk' even if they have their audio property set to true.

Jun 10 '20 03:06 nicolae-stroncea

Found a way to do this directly with Sound Drivers using a Python library called SoundCard. This works with any type of applications, not just web browsers.

Windows and Linux

Tested successfully on both Linux (relies on PulseAudio, so should work on all distributions by default) and Windows (relies on WASAPI, works on Windows 7+).

#!/usr/bin/env python
import soundcard as sc
import numpy as np

'''Get a microphone from a speaker, not the actual microphone'''
def getMic():
    mic = None
    mics = sc.all_microphones(include_loopback=True)
    for a_mic in mics:
        if(a_mic.isloopback):
            mic = a_mic
            break
    return mic

def checkAudio(mic):
    isAudio = False
    if(mic is not None):
        # record 1 second
        data = mic.record(samplerate=48000, numframes=48000)
        isAudio = np.any(data != 0)

    return (isAudio)


mic = getMic()
checkAudio(mic)

Mac

This will not work by default on MacOS because it does not provide loopback functionality.

Mac Users would have to download SoundFlower (also OSS), and set it up so it acts as a 'virtual speaker'.
We need to find name of the speaker. It should always stay the same, so somebody just needs to download it on a Mac and check.
Need a specific check in getMic for MacOS, and then get the mic that has SoundFlower's name.
Rest should be the same

I don't have a Mac to test this, so somebody should confirm to see if this works.

Jun 13 '20 17:06 nicolae-stroncea

@nicolae-stroncea I'll try this on my Macbook and report back shortly.

Jun 13 '20 17:06 jmealo

@jmealo Not sure if you already found it, but this tutorial seemed useful to me. It helps avoid some potential pitfalls of the setup, specifically that if you don't set multi-output, your Mac won't play any sound at all since all of it will be routed only to SoundFlower. It also explains how to select SoundFlower as an input device, which is what we need

Jun 13 '20 17:06 nicolae-stroncea

I'll still test Soundflower, but, I found this: https://stackoverflow.com/questions/27604207/applescript-check-if-computer-is-playing-any-sound#27608712

When I play a YouTube video in Chrome:

pmset -g | grep coreaudiod
sleep                1 (sleep prevented by sharingd, Google Chrome, coreaudiod, useractivityd)

When I paused the video coreaudiod stopped preventing sleep and no longer appeared in the output.

I fired up Zoom, with no meeting there was no output, upon starting a new meeting:

 hibernatefile        /var/vm/sleepimage
 disksleep            0
 sleep                1 (sleep prevented by sharingd, coreaudiod, coreaudiod)
 displaysleep         2 (display sleep prevented by zoom.us)

As far as false positives go: assuming a browser extension, you can differentiate between listening to music/watching a video.

If you poll this at regular intervals, you don't have to worry about notifications much. It seems like video conferencing will prevent the display from sleeping. I can test with something that uses WebRTC and verify.

Jun 13 '20 18:06 jmealo

I'm providing the output of some pmset commands that should provide information helpful for time/activity tracking:

While playing a Youtube video in Chrome:

2020-06-13 14:16:26 -0400 
Assertion status system-wide:
   BackgroundTask                 0
   ApplePushServiceTask           0
   UserIsActive                   1
   PreventUserIdleDisplaySleep    0
   PreventSystemSleep             0
   ExternalMedia                  0
   PreventUserIdleSystemSleep     1
   NetworkClientActive            0
Listed by owning process:
   pid 434(sharingd): [0x0000377400018c33] 00:00:40 PreventUserIdleSystemSleep named: "Handoff"  
   pid 626(Google Chrome): [0x000036c100018c27] 00:03:39 NoIdleSleepAssertion named: "Playing audio"  
   pid 273(mds_stores): [0x0000379c000b8c46] 00:00:00 BackgroundTask named: "com.apple.metadata.mds_stores.power"  
   pid 198(coreaudiod): [0x0000366f000180cb] 00:05:01 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineOutput:1B,0,1,1:0.context.preventuseridlesleep"  
	Created for PID: 742. 
   pid 431(useractivityd): [0x0000379a00018c45] 00:00:01 PreventUserIdleSystemSleep named: "BTLEAdvertisement"  
	Timeout will fire in 58 secs Action=TimeoutActionTurnOff
   pid 151(hidd): [0x0000365400098c0a] 00:00:00 UserIsActive named: "com.apple.iohideventsystem.queue.tickle serviceID:100000363 name:AppleEmbeddedKeyboa product:Apple Internal Keyb eventType:3"  
	Timeout will fire in 120 secs Action=TimeoutActionRelease
No kernel assertions.
Idle sleep preventers: IODisplayWrangler

While in a Zoom meeting (it looks like the developers forgot to provide the correct value for the activity):

% pmset -g assertions
2020-06-13 14:18:49 -0400 
Assertion status system-wide:
   BackgroundTask                 0
   ApplePushServiceTask           0
   UserIsActive                   1
   PreventUserIdleDisplaySleep    1
   PreventSystemSleep             0
   ExternalMedia                  0
   InternalPreventDisplaySleep    1
   PreventUserIdleSystemSleep     1
   NetworkClientActive            0
Listed by owning process:
   pid 26724(zoom.us): [0x0000381e00058c76] 00:00:12 NoDisplaySleepAssertion named: "Describe Activity Type"  
   pid 434(sharingd): [0x0000377400018c33] 00:03:02 PreventUserIdleSystemSleep named: "Handoff"  
   pid 106(powerd): [0x0000381600108002] 00:00:20 InternalPreventDisplaySleep named: "com.apple.powermanagement.delayDisplayOff"  
	Timeout will fire in 100 secs Action=TimeoutActionTurnOff
   pid 431(useractivityd): [0x0000382700018c78] 00:00:03 PreventUserIdleSystemSleep named: "BTLEAdvertisement"  
	Timeout will fire in 56 secs Action=TimeoutActionTurnOff
   pid 384(nsurlsessiond): [0x0000382800018c7a] 00:00:02 PreventUserIdleSystemSleep named: "NSURLSessionTask ADC0E368-B668-4A09-B48C-B1B11C78F152"  
	Timeout will fire in 10798 secs Action=TimeoutActionTurnOff
   pid 384(nsurlsessiond): [0x0000382800018c7b] 00:00:02 PreventUserIdleSystemSleep named: "NSURLSessionTask B2ED8888-9B0E-4A54-9F6F-207CFA4B82A2"  
	Timeout will fire in 10798 secs Action=TimeoutActionTurnOff
   pid 198(coreaudiod): [0x0000381f00018c5c] 00:00:11 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineOutput:1B,0,1,1:0.context.preventuseridlesleep"  
	Created for PID: 26724. 
   pid 198(coreaudiod): [0x0000381e00018c58] 00:00:12 PreventUserIdleSystemSleep named: "com.apple.audio.AppleHDAEngineInput:1B,0,1,0:1.context.preventuseridlesleep"  
	Created for PID: 26724. 
   pid 151(hidd): [0x0000365400098c0a] 00:00:00 UserIsActive named: "com.apple.iohideventsystem.queue.tickle serviceID:100000363 name:AppleEmbeddedKeyboa product:Apple Internal Keyb eventType:3"  
	Timeout will fire in 120 secs Action=TimeoutActionRelease
No kernel assertions.
Idle sleep preventers: IODisplayWrangler

Jun 13 '20 18:06 jmealo

Also found this command, which seems to draw inspiration from same source : if [[ "$(pmset -g | grep ' sleep')" == *"coreaudiod"* ]]; then echo audio is playing; else echo no audio playing; fi

It doesn't have the same level of detail, but can give a quick, cheap check if audio is playing

Jun 13 '20 18:06 nicolae-stroncea

@nicolae-stroncea: you can do all sorts of activity tracking beyond what you set out to do on OSX with pmset -g assertions, you can see whether the user clicks, scrolls, touches, multi-touches, types, etc... (it logs whatever resets the idle user timeout, as well as a count down, you can infer a great deal from this). Additionally, we get verbose logging of what's keeping the system from sleeping, which includes playing audio/video or using the webcam.

I wasn't able to get your Python to run, is it Python 2? I think it's a dead-end (but good idea! especially without having access to the hardware) given what I'm able to do by tailing the power telemetry from OSX.

Using pmset is low-overhead, can run as an unprivileged user, and doesn't require a third-party kernel extension, so it seems like the right way to approach what you set out to do (and then some!). It honestly seems like a bit of an oversight from a privacy perspective shrug.

Jun 13 '20 18:06 jmealo

@jmealo that's pretty neat! I imagine there's a lot of nice aw-watcher possibilities lying in there.

The script is Python3, but it would need some customizing for Mac to get it working with soundcard:

Once you install SoundFlower, you would need to query all of the microphones, and find the name that MacOS uses for SoundFlower: sc.all_microphones(). Iterate through them, then get the name of each microphone by doing the_mic.name, to find what name SoundFlower goes by.
Once you get the name by looking through the mics, you can get the microphone by the name: mic = sc.get_speaker('name_of_soundflower_input')

I agree that since pmset ... is lower overhead, it would be preferred.

I looked for a similar command that could be useful on Linux, and found: pacmd list-sink-inputs (again dependant on the pulseaudio, and I don't think there is a lot of fragmentation on this front). You can find if any sound is running by doing: pacmd list-sink-inputs | grep -w state | grep RUNNING. A pacmd list-sink-inputs returns info on the application running which is useful:

    index: 173
	driver: <protocol-native.c>
	flags: START_CORKED 
	state: RUNNING
	sink: 1 <alsa_output.pci-0000_00_1f.3.analog-stereo>
	volume: front-left: 52016 /  79% / -6.02 dB,   front-right: 52016 /  79% / -6.02 dB
	        balance 0.00
	muted: no
	current latency: 89.25 ms
	requested latency: 75.01 ms
	sample spec: float32le 2ch 44100Hz
	channel map: front-left,front-right
	             Stereo
	resample method: copy
	module: 10
	client: 17 <Firefox>
	properties:
		media.name = "AudioStream"
		application.name = "Firefox"
		native-protocol.peer = "UNIX socket client"
		native-protocol.version = "33"
		application.process.id = "5675"
		application.process.user = "nicolae"
		application.process.host = "nicolae"
		application.process.binary = "firefox"
		application.language = "en_US.UTF-8"
		window.x11.display = ":0"
		application.icon_name = "firefox"
		module-stream-restore.id = "sink-input-by-application-name:Firefox"

There are a couple of weird quirks that I didn't figure out about this. If I mute an application (but allow it to run), it will still show up with state: RUNNING and muted: no so not sure why this happens.

Wasn't able to find any similar command for Windows that we would be able to trigger directly from Python, but I'm not too familiar with developing on the platform. Worst case, soundcard could still be used for the cases where a reliable low-overhead platform-dependent command is not found.

Jun 13 '20 19:06 nicolae-stroncea

@jmealo that's pretty neat! I imagine there's a lot of nice aw-watcher possibilities lying in there.

The script is Python3, but it would need some customizing for Mac to get it working with soundcard:

Once you install SoundFlower, you would need to query all of the microphones, and find the name that MacOS uses for SoundFlower: sc.all_microphones(). Iterate through them, then get the name of each microphone by doing the_mic.name, to find what name SoundFlower goes by.

Once you get the name by looking through the mics, you can get the microphone by the name: mic = sc.get_speaker('name_of_soundflower_input')

For what it's worth: Soundflower (2ch) or Soundflower (64ch) seem to be the device names.

I agree that since pmset ... is lower overhead, it would be preferred.

I looked for a similar command that could be useful on Linux, and found: pacmd list-sink-inputs (again dependant on the pulseaudio, and I don't think there is a lot of fragmentation on this front). You can find if any sound is running by doing: pacmd list-sink-inputs | grep -w state | grep RUNNING. A pacmd list-sink-inputs returns info on the application running which is useful:
    index: 173
	driver: <protocol-native.c>
	flags: START_CORKED 
	state: RUNNING
	sink: 1 <alsa_output.pci-0000_00_1f.3.analog-stereo>
	volume: front-left: 52016 /  79% / -6.02 dB,   front-right: 52016 /  79% / -6.02 dB
	        balance 0.00
	muted: no
	current latency: 89.25 ms
	requested latency: 75.01 ms
	sample spec: float32le 2ch 44100Hz
	channel map: front-left,front-right
	             Stereo
	resample method: copy
	module: 10
	client: 17 <Firefox>
	properties:
		media.name = "AudioStream"
		application.name = "Firefox"
		native-protocol.peer = "UNIX socket client"
		native-protocol.version = "33"
		application.process.id = "5675"
		application.process.user = "nicolae"
		application.process.host = "nicolae"
		application.process.binary = "firefox"
		application.language = "en_US.UTF-8"
		window.x11.display = ":0"
		application.icon_name = "firefox"
		module-stream-restore.id = "sink-input-by-application-name:Firefox"
There are a couple of weird quirks that I didn't figure out about this. If I mute an application (but allow it to run), it will still show up with state: RUNNING and muted: no so not sure why this happens.

What a great find! I was looking to see if systemd had something, but, if pulseaudio can be queried directly that'd be good. I can't imagine that there's not a similar solution on any *nix based OS.

Wasn't able to find any similar command for Windows that we would be able to trigger directly from Python, but I'm not too familiar with developing on the platform. Worst case, soundcard could still be used for the cases where a reliable low-overhead platform-dependent command is not found.

It looks like powercfg is what we're looking for on Windows. I'm hoping that read-only operations don't require elevated permissions, write ones certainly do.

Jun 13 '20 19:06 jmealo

It looks like the output of powercfg -requests looks something like this (found on Microsoft answers for troubleshooting sleep issues):

SYSTEM:
[DRIVER] Cirrus Logic High Definition Audio (HDAUDIO\FUNC_01&VEN_ ...)
An audio stream is currently in use.
[PROCESS] \Device\HarddiskVolume2\Program Files (x86)\Windows Media Player\wmplayer.exe

Here's the documentation that I found for the command so far: https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/powercfg-command-line-options

Jun 13 '20 19:06 jmealo

I just tested on Windows:

No video/audio playing:

Microsoft Windows [Version 10.0.18363.900]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Windows\system32>powercfg -requests
DISPLAY:
None.

SYSTEM:
None.

AWAYMODE:
None.

EXECUTION:
None.

PERFBOOST:
[DRIVER] Legacy Kernel Caller
Power Manager

ACTIVELOCKSCREEN:
None.

Video playing:

C:\Windows\system32>powercfg -requests
DISPLAY:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Video Wake Lock

SYSTEM:
[DRIVER] NVIDIA High Definition Audio (HDAUDIO\FUNC_01&VEN_10DE&DEV_0072&SUBSYS_38423967&REV_1001\5&34bd84db&0&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Playing audio

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

Audio playing:

C:\Windows\system32>powercfg -requests
DISPLAY:
None.

SYSTEM:
[DRIVER] NVIDIA High Definition Audio (HDAUDIO\FUNC_01&VEN_10DE&DEV_0072&SUBSYS_38423967&REV_1001\5&34bd84db&0&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
[PROCESS] \Device\HarddiskVolume7\Program Files (x86)\Google\Chrome\Application\chrome.exe
Playing audio

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

Jun 13 '20 19:06 jmealo

@jmealo nice find! I just tested it, and it works well. Unfortunately, it requires administrative privileges. I had to run powershell as an administrator to get it to work.

Here's the output I got when playing a youtube video for it:

DISPLAY:
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe
[PROCESS] \Device\HarddiskVolume3\Program Files\Mozilla Firefox\firefox.exe

SYSTEM:
[DRIVER] Realtek Audio (INTELAUDIO\FUNC_01&VEN_10EC&DEV_0298&SUBSYS_1028087C&REV_1001\4&2223f159&2&0001)
An audio stream is currently in use.

AWAYMODE:
None.

EXECUTION:
None.

PERFBOOST:
None.

ACTIVELOCKSCREEN:
None.

Jun 13 '20 19:06 nicolae-stroncea

Were you able to somehow do it without privileged access?

Jun 13 '20 19:06 nicolae-stroncea

I had to run powershell as an administrator to get it to work.

:( Same here, I used an elevated cmd. I wonder what it uses under the hood? If there's an alternative way to get log entries for this from Windows that doesn't require elevated permissions.

Jun 13 '20 19:06 jmealo

I was able to use this code on Windows to detect sound. Runs without any privileges.

Jun 13 '20 19:06 nicolae-stroncea

More accurately record time spent consuming video media

The Problem

Possible solutions

Application/Site Tagging

Advantages

Disadvantages/Pitfalls

Enhancements

Monitoring Hardware

Advantages

Disadvantages/Pitfalls

Enhancements

General Enhancements

User-Defined Lists & Filters

Tagging and Pattern Matching

Conclusion

Windows and Linux

Mac