posthog-go icon indicating copy to clipboard operation
posthog-go copied to clipboard

Ability to specify uuid when capturing events

Open evgeek opened this issue 1 year ago • 3 comments

Hello!

We really miss the ability to specify the uuid of events manually when capturing events. The thing is that we cannot guarantee that events come to our system only once. In the current implementation, in such cases, events in Posthog will be duplicated. In the SDK in other languages, we got around this by manually generating the uuid, calculated based on the immutable fields of the event - however, in the Golang SDK, unfortunately, there is no such option yet. Please consider the PR adding this functionality.

P.S. As far as I can see, the current implementation does not use the posthog.Config.uid() function, so I removed it.

evgeek avatar Aug 19 '24 12:08 evgeek

@fuziontech @dmarticus Hi guys! Sorry to bother you, but maybe you can help me with this? :)

I checked the functionality on our project - everything works perfectly well!

evgeek avatar Aug 22 '24 09:08 evgeek

We have bunch of projects that send events. They save them in a DB, and then send them to different services via AMQP (https://github.com/umbrellio/table_sync), where they are also saved in a DB. These messages can be sent multiple times for different reasons: from updating a row in a DB to forcing sync of some events. In our services, we handle retries using message versions (old messages are ignored, new ones update previously written data by primary key). But sending to Posthog is different from saving to a DB because of updating unavailable, and we have to ensure that a retried AMQP message does not cause new events. We can save the id of the sent events in a DB, but this will cause unnecessary overhead: creating and managing a new DB, slowing down the system due to IO, resolving race conditions when multiple k8s pods try to access the same row, etc. Or we can just calculate a reproductible UUID - in this case, Posthog will simply skip the re-sent event. Very simple and elegant, I belive :)

About deduplication on ingestion: This is a great feature, but since our events are stored in a database and can be updated, it may not always recognize duplicates. And I'm not sure how dedupe will work if the same event is sent on different days.

About other SDKs - I haven't tried them all, but PHP definitely allows it, we used the same approach in the previous version of the event service.

evgeek avatar Aug 22 '24 17:08 evgeek

@evgeek thanks for your thorough reply! I understand your situation better now. I think your perspective makes sense, and frankly it makes me think that all of our SDKs should let consumers supply optional UUIDs. Let's get this change merged, thanks for doing it!

dmarticus avatar Aug 22 '24 23:08 dmarticus

Excellent! Thanks for your responsiveness and awesome product! :)

evgeek avatar Aug 23 '24 11:08 evgeek

I think this pull request broke posthog.Capture. I am currently getting an error in v1.2.19. posthog 2024/08/24 23:59:13 INFO: response 400 400 Bad Request – failed to parse request: data did not match any variant of untagged enum RawRequest

I had to downgrade to v1.2.18 and everything is working as expected.

migueldv90 avatar Aug 25 '24 05:08 migueldv90

I think this pull request broke posthog.Capture. I am currently getting an error in v1.2.19.

posthog 2024/08/24 23:59:13 INFO: response 400 400 Bad Request – failed to parse request: data did not match any variant of untagged enum RawRequest

I had to downgrade to v1.2.18 and everything is working as expected.

Ooof; thanks for the report! Sorry to cause breakage; I'm away from my computer this weekend but I'll look into a fix when I'm back!

dmarticus avatar Aug 25 '24 05:08 dmarticus

@migueldv90 Sorry for the inconvenience!

@dmarticus It seem I tested the feature on the /capture endpoind, but the SDK sends via /batch. My bad :(

Difference is that /capture can ingest uuid with an empty string, while /batch cannot. Good news: /capture can handle batch events. Example:

curl --location 'https://eu.i.posthog.com/capture/' \
--header 'Content-Type: application/json' \
--header 'User-Agent: posthog-go (version: 1.2.19)' \
--data '{
  "api_key" : "phc_...",
  "batch" : [ {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event1",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test3"
    },
    "send_feature_flags" : false
  },
  {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event2",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test2"
    },
    "send_feature_flags" : false
  },
  {
    "type" : "capture",
    "uuid" : "",
    "library" : "posthog-go",
    "library_version" : "1.2.19",
    "timestamp" : "2024-08-25T11:50:20.154164+00:00",
    "distinct_id" : "1234",
    "event" : "event3",
    "properties" : {
      "$lib" : "posthog-go",
      "$lib_version" : "1.2.19",
      "project" : "test1"
    },
    "send_feature_flags" : false
  } ]
}'

Screenshot 2024-08-25 at 14 58 23

As a quick fix I made this PR - https://github.com/PostHog/posthog-go/pull/70. It works, but using an endpoint not designed for batch processing is obviously is not very good decision. It seems that adjusting /batch endpoint to works with empty uuid is better solution.

evgeek avatar Aug 25 '24 11:08 evgeek

I'm going to revert this change so that folks get the fix ASAP. @evgeek has the start of a fix here: https://github.com/PostHog/posthog-go/pull/70, but I want to clarify a few things first, and this will stop the pain for folks pulling from latest.

dmarticus avatar Aug 26 '24 15:08 dmarticus

Hi @evgeek

Did some digging into our ingestion endpoint and found that, yes – due to the way we serialize batch requests, we can handle missing uuid entries or valid uuids, but not empty strings. I think it makes the most sense to fix our ingestion endpoint to handle empty strings, rather than work around it within the SDK. I've made a PR for this here: https://github.com/PostHog/posthog/pull/24586. Once it ships, you should be able to re-implement your original changes and the batch endpoint should handle empty UUIDs.

dmarticus avatar Aug 26 '24 20:08 dmarticus

Excellent, thank you, I'll be looking forward to it!

evgeek avatar Aug 26 '24 22:08 evgeek

@dmarticus Hi!

I see your PR about /batch is already in production.

I made a new PR with reimplementation of uuid specification feature - https://github.com/PostHog/posthog-go/pull/72. I tested it, now it works with empty uuid. Please review it when you can :)

evgeek avatar Aug 28 '24 10:08 evgeek

Hi @evgeek

Thanks for reaching out! I was actually going to DM you today; I merged the /batch PR but for that service, we do manual releases (it's a new service, we're treating it with a bit more caution), so it's only live in our EU datacenter for now (which is why your testing worked – your PostHog account is in the EU). I'm going to release this to US today and then we can merge your PR so that everyone gets UUID goodness :)

dmarticus avatar Aug 28 '24 15:08 dmarticus

Got it!

evgeek avatar Aug 29 '24 08:08 evgeek

Hi @evgeek !

Our /batch changes have been deployed everywhere! I'm going to merge https://github.com/PostHog/posthog-go/pull/72 now :)

dmarticus avatar Aug 29 '24 17:08 dmarticus

@dmarticus We did it! Peachy! Thanks :)

evgeek avatar Aug 29 '24 18:08 evgeek