block all event / person data from user strings with `bot.html`
on heavily scraped sites, bots still generate junk data, increase event volumes, and result in persons with empty events.
Some offenders visible in this thread
We should block all autocapture events that match user agents above, or that contain /bot.html
I'm looking at a way to not capture events from Yandex bots from my Posthog. They have a list of user agents, but it doesn't look like Posthog sends user-agent data. I'm thinking either I'll need to send a flag down from the server (e.g. isBot) or maybe posthog-js can handle this too.
Edit: I'm trying out using a Cloudflare transform rule to add a X-Known-Bot HTTP request header so my server can decide whether to include Posthog or not 👍
We do have a list of bots that we block in posthog-js: https://github.com/PostHog/posthog-js/blob/master/src/utils.ts#L469-L509
Is there something we should add here?
we've significantly improved bot blocking since last comment... let's assume fixed