posthog
posthog copied to clipboard
Remaining tasks for person-on-events
Background
person-on-events work has been ongoing for a long while now. This issue consolidates remaining work to be done under a single issue to make sure every team is aware of the remaining tasks.
What's left TO DO
Ingestion
Owners: team ingestion
- [ ] Make sure every event has a person_id - [ ] Finish run of 0006 migration on cloud (status: 99% finished, some events from 2021 need work) - [x] Fix bug affecting ~0.001% of events https://github.com/PostHog/posthog/pull/11077 https://github.com/PostHog/posthog/pull/11084 @macobo @tiina303 - [ ] Re-migrate rows with missing person_id on cloud - [ ] Release 0006 async migration in 1.39.0
- [ ] Release buffer as part of 1.39.0
- [ ] Right to be forgotten: Create tooling to allow purging person and groups data from events (proposed next sprint goal)
- [ ] Resharding
events
table to be sharded byperson_id
- not urgent, this can be done after releasing everything else
Queries
Owners: team west (cc @neilkakkar @EDsCODE)
- [ ] Column materialization improvements - We won't be moving to JSON data type anytime soon: context. Hence we need to make property materialization code a lot better: - Make sure it functions in a stable way on cloud - person/groups column materialization logic for events table
- [x] Update queries to use new person_created_at and other new created_at columns
- [ ] Verify all queries work
Releasing
Once we're happy with the work above, we can enable person-on-events on cloud. I'd suggest the following release pattern:
- [ ] Add a feature flag we can toggle for this
- [ ] Enable for team 2, communicate internally
- [ ] Bugsquash
Owners: @EDsCODE is team west ready to own this?
Communication
- [ ] Updating documentation on the conversion buffer
- [ ] Updating documentation on person-on-events and impact of this
- ....
Owners: Unclear - originally Marcus was intended to help here. Probably me and Yakko on dev side?
Let me know if I missed anything important task-wise.
Note that there's a project https://github.com/orgs/PostHog/projects/41/views/1 though some of the tasks there don't need to be there I'll let you take a look first though.
The project is for our (team-ingestion) team, this task is about cross-team-collaboration/syncing. Different goals.
At a minimum from the stuff in the project we need the buffer roll-out to happen to everyone (cloud and self-hosted) before async migration & Re-migrate rows with missing person_id on cloud