posthog Replay "yes, and" Pipeline

How Pipeline team can help Replay team:

[ ] Continued blobby more resilient work if any:
- consumer lifecycle and offset management?
- Background on Blobby: https://docs.google.com/document/d/1Pj_Lpi3nGcFOARXQ_07yZPO4qFGvtj_qt7elxr4hQws/edit
[ ] Help get error tracking of the ground
- Great opportunity for us to clarify how new event based products should be span up, see https://docs.google.com/document/d/17aOrHFk1iOSJHKWgk9RCu_Zqy5i2ZHhjV7XNAdOZO64/edit
[ ] Replay without product analytics
- Specific problem is product analytics getting quota limited if the user isn't paying for it, but persons should likely still be created
[ ] replay 🤝 machine learning
- we want to generate embeddings for recordings
- right now we're generating over a subset of recordings for our team using celery
- celery is pretty terrible at "please run this task over and over at a defined rate and keep your task queue full"
- since it prefers "you gave me 10,000 copies of a task I will try and run them all super fast and kill your dependencies"
- this is a very shared problem since it's a replay product need but affects and is affected by ingestion
- (ideally I'd have an RFC right now but we're still testing)
- one obvious thing for us to evaluate is if we should use temporal rather than celery
- imagine the algorithm is roughly
  - for all opted in teams
  - for all recordings over some minimum duration that have not yet had embeddings generated (or have ingested significantly more data since last processing)
  - apply some filters to avoid processing every byte of every recording
  - run some ML that generates embeddings
  - store those embeddings
  - on another timer
  - run clustering to generate magic playlists if the embeddings for the team have changed

Mar 19 '24 13:03 tiina303

(added some info on the embeddings - it might not totally make sense 🙈)

cc @daibhin

Mar 19 '24 13:03 pauldambra

things have moved on past this i think

Aug 30 '24 13:08 pauldambra

posthog posthog copied to clipboard

Replay "yes, and" Pipeline

posthog
posthog copied to clipboard