posthog-js icon indicating copy to clipboard operation
posthog-js copied to clipboard

block all event / person data from user strings with `bot.html`

Open camerondeleone opened this issue 3 years ago • 2 comments

on heavily scraped sites, bots still generate junk data, increase event volumes, and result in persons with empty events.

Some offenders visible in this thread

We should block all autocapture events that match user agents above, or that contain /bot.html

camerondeleone avatar Nov 15 '22 23:11 camerondeleone

I'm looking at a way to not capture events from Yandex bots from my Posthog. They have a list of user agents, but it doesn't look like Posthog sends user-agent data. I'm thinking either I'll need to send a flag down from the server (e.g. isBot) or maybe posthog-js can handle this too.

Edit: I'm trying out using a Cloudflare transform rule to add a X-Known-Bot HTTP request header so my server can decide whether to include Posthog or not 👍

kamranayub avatar Sep 13 '23 02:09 kamranayub

We do have a list of bots that we block in posthog-js: https://github.com/PostHog/posthog-js/blob/master/src/utils.ts#L469-L509

Is there something we should add here?

mariusandra avatar Sep 13 '23 08:09 mariusandra

we've significantly improved bot blocking since last comment... let's assume fixed

pauldambra avatar Jul 11 '24 19:07 pauldambra