feat: hogql for replay poe support
need to support PoE for replay queries
particularly for larger orgs using test account filtering
what do we have
given we know it already meets performance needs for larger orgs the non-hogql version adds
AND s.distinct_id in
(select distinct_id
from
(SELECT distinct_id,
argMax(person_id, version) as current_person_id
FROM person_distinct_id2 as pdi
INNER JOIN
(SELECT id
FROM person
WHERE team_id = 2
AND id IN
(SELECT id
FROM person
WHERE team_id = 2
AND (has(['bla'], "pmat_email")) )
GROUP BY id
HAVING max(is_deleted) = 0
AND (has(['bla'], argMax(person."pmat_email", version))) SETTINGS optimize_aggregation_in_order = 1) person ON person.id = pdi.person_id
WHERE team_id = 2
GROUP BY distinct_id
HAVING argMax(is_deleted, version) = 0) as session_persons_sub_query)
so we're joining straight from session replay events table
what do we want
🤷 something as fast or faster 🤣
right now the implementation is generating
FROM session_replay_events AS s
INNER JOIN
(SELECT argMax(person_distinct_id2.person_id, person_distinct_id2.version) AS s__pdi___person_id,
argMax(person_distinct_id2.person_id, person_distinct_id2.version) AS person_id,
person_distinct_id2.distinct_id AS distinct_id
FROM person_distinct_id2
WHERE equals(person_distinct_id2.team_id, 2)
GROUP BY person_distinct_id2.distinct_id
HAVING ifNull(equals(argMax(person_distinct_id2.is_deleted, person_distinct_id2.version), 0), 0)) AS s__pdi ON equals(s.distinct_id, s__pdi.distinct_id)
INNER JOIN
(SELECT person.id AS id,
nullIf(nullIf(person.pmat_email, ''), 'null') AS properties___email
FROM person
WHERE and(equals(person.team_id, 2), ifNull(in(tuple(person.id, person.version),
(SELECT person.id AS id, max(person.version) AS version
FROM person
WHERE equals(person.team_id, 2)
GROUP BY person.id
HAVING ifNull(equals(argMax(person.is_deleted, person.version), 0), 0))), 0)) SETTINGS optimize_aggregation_in_order=1) AS s__pdi__person ON equals(s__pdi.s__pdi___person_id, s__pdi__person.id)
WHERE and(equals(s.team_id, 2), ifNull(greaterOrEquals(toTimeZone(s.min_first_timestamp, 'UTC'), toDateTime64('2020-12-31 20:00:00.000000', 6, 'UTC')), 0), ifNull(greaterOrEquals(toTimeZone(s.min_first_timestamp, 'UTC'), toDateTime64('2021-01-14 00:00:00.000000', 6, 'UTC')), 0), ifNull(lessOrEquals(toTimeZone(s.min_first_timestamp, 'UTC'), toDateTime64('2021-01-21 20:00:00.000000', 6, 'UTC')), 0), ifNull(equals(s__pdi__person.properties___email, 'bla'), 0))
join session_replay_events to person_distinct_id and joining it to person and then filtering the joined results
so we're joining rather than subquerying, my guess is that is slower...
let's find out
oof HogQL version is the control column
Help is on its way in https://github.com/PostHog/posthog/pull/22468 -- I'm curious to see if what's already in that PR will have solved anything.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.