One step in fixing the "Merge web watcher events into window events in the query? (to allow for classifying by url/domain)" https://github.com/ActivityWatch/aw-webui/issues/151. We already have it in the same query, but the data is not merged. Hopefully this will be a good transform to merge window and browser data in that manner.

Oct 18 '20 12:10 johan-bjareholt

Codecov Report

Merging #179 (77eef3d) into master (7d55fca) will increase coverage by 13.67%. The diff coverage is 5.06%.

@@             Coverage Diff             @@
##           master     #179       +/-   ##
===========================================
+ Coverage   46.30%   59.97%   +13.67%     
===========================================
  Files          51       44        -7     
  Lines        6148     4765     -1383     
  Branches     1454        0     -1454     
===========================================
+ Hits         2847     2858       +11     
+ Misses       2442     1907      -535     
+ Partials      859        0      -859

Impacted Files	Coverage Δ
aw-transform/src/union.rs	`0.00% <0.00%> (ø)`
aw-query/src/functions.rs	`82.92% <28.57%> (+16.05%)`	:arrow_up:
aw-server/src/dirs.rs	`0.00% <0.00%> (-73.69%)`	:arrow_down:
aw-server/src/logging.rs	`0.00% <0.00%> (-62.50%)`	:arrow_down:
aw-transform/src/split_url.rs	`12.50% <0.00%> (-54.17%)`	:arrow_down:
aw-query/src/ast.rs	`0.00% <0.00%> (-37.26%)`	:arrow_down:
aw-models/src/key_value.rs	`15.38% <0.00%> (-34.62%)`	:arrow_down:
aw-transform/src/flood.rs	`66.66% <0.00%> (-16.56%)`	:arrow_down:
aw-transform/src/filter_keyvals.rs	`77.27% <0.00%> (-6.37%)`	:arrow_down:
aw-transform/src/find_bucket.rs	`80.00% <0.00%> (-4.22%)`	:arrow_down:
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7d55fca...77eef3d. Read the comment docs.

Oct 18 '20 12:10 codecov[bot]

However, as we discussed AFK, I'm not convinced this is the best approach to merging the web events compared to simply replacing all browser-window events when an active browser event exists, but that has its own issues (like having to inject the browser appname, slightly changing the title by removing the appended - Mozilla Firefox, and maybe more).

So like the opposite of filter_period_intersect and then concatenating window and browser events?

Oct 29 '20 20:10 johan-bjareholt

@johan-bjareholt I think something like:

window_events = query(...)   # query and do AFK filtering

for browser in browsers:  # this for-loop would need to be expressed in the function constructing the query
    web_events = query("aw-watcher-web-$browser")
    window_events_browser = filter_by(window_events, {app: $browserAppname})  # can't remember what transform is used to do this
    web_events_active = filter_period_intersect(browser_events, window_events_browser)`  # filter web events by browser being the active window
    web_events_active = amend_data(web_events_active, {app: $browserAppname})  # optional, amends the missing {app: $browserAppname}
    window_events = union(web_events_active, window_events)  # picks events from web_events_active first, then fills rest with window_events

Not sure if we have something like union (in the way I mean it) already, but I mean something that does:

"bucket"          | events
---------------------------------------------------------------
web_events_active |           [github  ]           [youtube   ]
window            |[spotify  ][browser ][terminal ][browser   ]
---------------------------------------------------------------
result            |[spotify  ][github  ][terminal ][youtube   ]

A complete end-to-end example would be:

bucket  | events
---------------------------------------------
window  | [terminal  ][firefox  ][terminal  ]
browser | [youtube   ][google               ]
---------------------------------------------
result  | [terminal  ][google   ][terminal  ]

Oct 30 '20 14:10 ErikBjare

web_events_active = amend_data(web_events_active, {app: $browserAppname})  # optional, amends the missing {app: $browserAppname}

I think this would be a strange transform, inserting data into an event which is hardcoded in the query and does not come from a bucket feels like something we should try to avoid. One reason would be because it encourages people to write dynamically generated queries like we do in the web-ui which is something we don't want.

window_events = union(web_events_active, window_events) # picks events from web_events_active first, then fills rest with window_events

To call that "union" is very misleading in my opinion. A union should be a common ground, not exclude anything.

Oct 30 '20 16:10 johan-bjareholt

However, as we discussed AFK, I'm not convinced this is the best approach to merging the web events compared to simply replacing all browser-window events when an active browser event exists, but that has its own issues (like having to inject the browser appname, slightly changing the title by removing the appended - Mozilla Firefox, and maybe more).

What this transform does is essentially this, but it also injects the browsers appname. See the example in the code

///     |---------|--------------------|
///     | events1 |[a     ][b     ]    |
///     | events2 |    [c     ]    [d ]|
///     | result  |[a ][ac][bc][b ]    |
///     |---------|--------------------|

So if we put that into perspective of window events and browser events already filtered with the browser window:

///     |---------|--------------------------|
///     | window  |[firefox   ][terminal    ]|
///     | browser |    [google    ]          |
///     | result  |[f ][f+g   ][terminal    ]|
///     |---------|--------------------------|

Which is pretty much exacly what you just wrote?

Oct 30 '20 16:10 johan-bjareholt

To call that "union" is very misleading in my opinion. A union should be a common ground, not exclude anything.

For sure, I was just unsure what to call it.

I think this would be a strange transform, inserting data into an event which is hardcoded in the query and does not come from a bucket feels like something we should try to avoid.

I agree in general, but in this case the data does come from a bucket (and doesn't actually introduce new hardcoding, since we already have our list of $browserAppname). We're reinserting exactly what we know we lost before (with filter_by and filter_period_intersect).

But on second thought I realize it'll still be messy, due to there being multiple possible appnames for each browser.

So if we put that into perspective of window events and browser events already filtered with the browser window: <example>

I thought it would become:

///     |---------|---------------------------|
///     | window  |[firefox   ][terminal     ]|
///     | browser |    [google     ]          |
///     | result  |[f ][f+g   ][t+g][terminal]|
///     |---------|---------------------------|

Basically what I'm trying to get rid of splitting events into two.

Edit: Ah nvm, your example would indeed become as you wrote after filtering the browser events.

But it would still lead to things like this, no?:

/// Abbreviations:
///  - ff: Firefox
///  - t1: Tab 1
///  - ff(t1): Firefox window event polled from when `t1` was active
///  - ff(t1)+t2: Firefox event with title from `t1`, but URL from `t2` (after `merge_map`)
///
///     |---------|------------------------------------|
///     | window  |[ff(t1)                ][ff(t2)    ]|
///     | browser |[t1        ][t2                    ]|
///     | result  |[ff(t1)+t1 ][ff(t1)+t2 ][ff(t2)+t2 ]|
///     |---------|------------------------------------|

So if I'm not mistaken, this would result in 'middle-events' where the title (which is gotten from the window event, if I understood merge_map correctly) and URLs (from the web event) are misaligned. This will happen anytime two events don't perfectly overlap (which is always), leading to a lot of small misaligned events like this.

Edit 2: I'm not sure, but maybe this could be resolved by using non-flooded window-events for the union_events_split and flood after?

Maybe that would lead to:

///     |---------|-------------------------------|
///     | window  |[ff(t1)    ]      [ff(t2)     ]|
///     | browser |[t1           ][t2            ]|
///     | result  |[ff(t1)+t1 ]      [ff(t2)+t2  ]|
///     |---------|-------------------------------|

Which after flooding would become the correct:

///     |---------|-------------------------------|
///     | window  |[ff(t1)    ]      [ff(t2)     ]|
///     | browser |[t1           ][t2            ]|
///     | result  |[ff(t1)+t1    ][ff(t2)+t2     ]|
///     |---------|-------------------------------|

(or something similar, depending on flooding strategy)

This might require a lot of extra memory though, since we'd need both window_events (not flooded, for filtering the browser events that go into union_events_split) and window_events_flooded (flooded, for filtering the browser events that go into filter_period_intersect).

Edit 3: Regardless, I'd be happy to merge this if there were more comprehensive tests (for example, checking that these 'middle-events' get created as expected).

Unless we can come up with a neat solution to the problem (which I'm no longer sure there is) I think it's better to just merge this in the meantime, and work on perfecting the transforms/queries later (as this'll probably work good enough).

Nov 01 '20 09:11 ErikBjare

So if I'm not mistaken, this would result in 'middle-events' where the title (which is gotten from the window event, if I understood merge_map correctly) and URLs (from the web event) are misaligned. This will happen anytime two events don't perfectly overlap (which is always), leading to a lot of small misaligned events like this.

That's a very good point which I thought of before but forgot.

Edit 2: I'm not sure, but maybe this could be resolved by using non-flooded window-events for the union_events_split and flood after?

That sounds like a very clever way of solving it, will try that out.

This might require a lot of extra memory though, since we'd need both window_events (not flooded, for filtering the browser events that go into union_events_split) and window_events_flooded (flooded, for filtering the browser events that go into filter_period_intersect).

I took a deep dive into the query code and it seems like you are correct here. Here's an example of how we would have to change our transforms

- events = flood(query_bucket("bucketname"));
+ events_unflooded = query_bucket("bucketname");
+ events_flooded = flood(events_unflooded)

The query language is very inefficient with its assignments, every time we assign something it gets cloned every time it's used afterwards because we do not know if the variable will be used afterwards or not and we need to guarantee that the variable won't change. So previously when we just called flood(query_bucket("bucketname")) we could just re-use the previous events while when we assign it we need to first keep the value in the variable and then clone it on each reference as well as not release the value of the variable until the whole query is complete. This can be improved in the future by #119.

But I think that the impact of just one more clone would be minimal compared to the whole issue we have today with #119, there are lots of more unnecessary clones than just this.

Nov 01 '20 20:11 johan-bjareholt

Here's a query that works with this transform, the web-ui queries.ts code is really messy though so I'm having a hard time adapting it, especially now that "canonicalQuery" should probably include browser events.

{
  "timeperiods": [
    "2020-11-01T01:00:00Z/2020-11-30T01:00:00Z"
  ],
  "query": [
    "window_events = query_bucket(\"aw-watcher-window_johan-laptop2\");",
    "browser_events = query_bucket(\"aw-watcher-web-firefox\");",
    "browser_events = split_url_events(browser_events);",
    "firefox_events = filter_keyvals(window_events, \"app\", [\"firefox\"]);",
    "events = union_events_split(firefox_events, browser_events);",
    "events = merge_events_by_keys(events, [\"$domain\"]);",
    "events = categorize(events, [
        [[\"docs.rs\"], {\"type\": \"regex\", \"regex\": \"^docs.rs$\"}],
        [[\"reddit\"], {\"type\": \"regex\", \"regex\": \"^reddit.com$\"}],
        [[\"crates.io\"], {\"type\": \"regex\", \"regex\": \"^crates.io$\"}]
    ]);",
    "events = sort_by_duration(events);",
    "RETURN = events;"
  ]
}

Nov 16 '20 20:11 johan-bjareholt

Hi! Seems that all major problems have been solved in above discussions. What's currently blocking this PR?

Mar 14 '22 10:03 AllanChain

aw-transform: Add union_events_split

Codecov Report