posthog
posthog copied to clipboard
Set up NAT Gateway for posthog cloud
Is your feature request related to a problem?
https://github.com/PostHog/redshift-plugin/issues/3
We would love to have a static IP or IP range that users could unblock for incoming traffic from posthog (e.g. webhooks, plugins)
Describe the solution you'd like
Set up a NAT gateway for our VPC
Additional context
Not sure if this is 100% the correct approach - @tiina303 @fuziontech thoughts?
Related zendesk tickets (please edit this comment to add) Batch exports/webhooks
- https://posthoghelp.zendesk.com/agent/tickets/1702
- https://posthoghelp.zendesk.com/agent/tickets/6423
- https://posthoghelp.zendesk.com/agent/tickets/8276
- https://posthoghelp.zendesk.com/agent/tickets/7879
- https://posthoghelp.zendesk.com/agent/tickets/7623
- https://posthoghelp.zendesk.com/agent/tickets/6490
- https://posthoghelp.zendesk.com/agent/tickets/5980
- https://posthoghelp.zendesk.com/agent/tickets/5783
- https://posthoghelp.zendesk.com/agent/tickets/5638
Data warehouse
- https://posthoghelp.zendesk.com/agent/tickets/10511
Sending events
- https://posthoghelp.zendesk.com/agent/tickets/9650
Thank you for your feature request – we love each and every one!
Yes, I would be interested in this too for the bigquery-plugin
please ping me when this is done so I can update relevant docs
made some progress on this - but it is relatively involved and not top priority currently
Another request from slack: https://posthogusers.slack.com/archives/G01JXEDAL22/p1632611486180800
+1 from me on this again
got a request for this 2 weeks ago when I was doing support as well.
Talked to @guidoiaquinti about this and it sounds extremely painful. Something they pushed back on for years at another company because of the problems that would have cropped up because of it. It will basically really limit us in what we can do with our infrastructure.
The alternative we should push people towards is having them push their data to us vs us pulling the data from them. It is better in almost every way.
I'd like to push back on the above.
How painful is "extremely painful"?
This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.
Currently we suggest people to just whitelist all internet access to their Redshift clusters so we can push data there. This is definitely not great. Sure, strong passwords etc. will help you feel safe, but having an IP range to whitelist would be very useful.
@fuziontech
For the record, I'm not saying we have to do this - just wanted to add the context that the key thing here is exports, which are a key part of our platform.
Could we reduce the scope of this or rethink this problem? Can we provide a reasonably stable public IP/range for the plugin server particularly?
This issue has 10 comments. Issues this long are very hard to read or to contribute to, and tend to take very long to reach a conclusion. Instead, why not:
- Write some code and submit a pull request! Code wins arguments
- Have a sync meeting to reach a conclusion
- Create a request for comments in the meta repo or product internal repo
Could we reduce the scope of this or rethink this problem? Can we provide a reasonably stable public IP/range for the plugin server particularly?
This is what the goal would be. Unfortunately there's no easy way to do this even with a relatively large CIDR block. We should push customers who want to load PostHog with their own data to use an export script to push data from their warehouse to us. We should move away as much as possible from pulling data from their warehouse.
This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.
I understand this is for exports, that is specifically what I was referencing here. Importing data from customers using plugins is not the way we should be importing data into PostHog or exporting data from customers. We had a good chat in https://github.com/PostHog/plugin-server/issues/529#issuecomment-900741592 about this. The ideal way for customers to export data from their warehouses into PostHog is for us to provide them with export scripts and examples for how to load PostHog using Airflow, Azkaban, Luigi, or just CRON and our client libraries. This is what we want to support going forward. Connecting to their warehouses is not a solution most companies, especially large, will be comfortable with.
Edit:
This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.
Ah! yeah we will still need to have a story for ingesting data from PostHog -> Data warehouse using our API. Granting access to these databases is just something that most companies will not give freely even with a restricted CIDR block. As part of the tooling for data teams we should have an easy Library/API for bulk consuming data out of PostHog as well as getting data back into PostHog. It also becomes a moot point as soon as they install helm chart internally since they will be able to hit anything they want to configure from within their private network like some of our clients already have with BigQuery, GCS, and PubSub.
+2 from users who reached out to support today.
Most services that can push or connect to a redshift database let the customers know the source IP address or addresses that the connection will be coming from. This includes but is not limited to ETL provider (Stich data for example) Data platforms (such as looker) and most other kinds. This allows these addresses to be white listed.
It is rather unusual that this is not an option that is currently offered by Posthog.
More advanced offerings also offer connection over SSH or AWS VPC peering. SSH would not work but VPC peering would (or even Cloudflare tunnels which we have not used but as it is wireguard would be an awesome solution)
We're using hosted Posthog and unfortunately this is a big deal for us. Allowing global internet access to a database instance won't make it past compliance review.
+1 from a customer for whom this is blocking Redshift import. Slack thread.
For the record, major inconvenience for us as well.
Another request from a customer (internal zendesk link)
Another customer request (internal zendesk link)
Big issue for us as well, with our Postgres DB being hosted on GCP