posthog icon indicating copy to clipboard operation
posthog copied to clipboard

Replay "yes, and" Pipeline

Open tiina303 opened this issue 1 year ago • 1 comments

How Pipeline team can help Replay team:

  • [ ] Continued blobby more resilient work if any:
    • consumer lifecycle and offset management?
    • Background on Blobby: https://docs.google.com/document/d/1Pj_Lpi3nGcFOARXQ_07yZPO4qFGvtj_qt7elxr4hQws/edit
  • [ ] Help get error tracking of the ground
    • Great opportunity for us to clarify how new event based products should be span up, see https://docs.google.com/document/d/17aOrHFk1iOSJHKWgk9RCu_Zqy5i2ZHhjV7XNAdOZO64/edit
  • [ ] Replay without product analytics
    • Specific problem is product analytics getting quota limited if the user isn't paying for it, but persons should likely still be created
  • [ ] replay 🤝 machine learning
    • we want to generate embeddings for recordings
    • right now we're generating over a subset of recordings for our team using celery
    • celery is pretty terrible at "please run this task over and over at a defined rate and keep your task queue full"
    • since it prefers "you gave me 10,000 copies of a task I will try and run them all super fast and kill your dependencies"
    • this is a very shared problem since it's a replay product need but affects and is affected by ingestion
    • (ideally I'd have an RFC right now but we're still testing)
    • one obvious thing for us to evaluate is if we should use temporal rather than celery
    • imagine the algorithm is roughly
      • for all opted in teams
      • for all recordings over some minimum duration that have not yet had embeddings generated (or have ingested significantly more data since last processing)
      • apply some filters to avoid processing every byte of every recording
      • run some ML that generates embeddings
      • store those embeddings
      • on another timer
      • run clustering to generate magic playlists if the embeddings for the team have changed

tiina303 avatar Mar 19 '24 13:03 tiina303

(added some info on the embeddings - it might not totally make sense 🙈)

cc @daibhin

pauldambra avatar Mar 19 '24 13:03 pauldambra

things have moved on past this i think

pauldambra avatar Aug 30 '24 13:08 pauldambra