posthog icon indicating copy to clipboard operation
posthog copied to clipboard

Set up NAT Gateway for posthog cloud

Open macobo opened this issue 3 years ago • 18 comments

Is your feature request related to a problem?

https://github.com/PostHog/redshift-plugin/issues/3

We would love to have a static IP or IP range that users could unblock for incoming traffic from posthog (e.g. webhooks, plugins)

Describe the solution you'd like

Set up a NAT gateway for our VPC

Additional context

Not sure if this is 100% the correct approach - @tiina303 @fuziontech thoughts?

Related zendesk tickets (please edit this comment to add) Batch exports/webhooks

  • https://posthoghelp.zendesk.com/agent/tickets/1702
  • https://posthoghelp.zendesk.com/agent/tickets/6423
  • https://posthoghelp.zendesk.com/agent/tickets/8276
  • https://posthoghelp.zendesk.com/agent/tickets/7879
  • https://posthoghelp.zendesk.com/agent/tickets/7623
  • https://posthoghelp.zendesk.com/agent/tickets/6490
  • https://posthoghelp.zendesk.com/agent/tickets/5980
  • https://posthoghelp.zendesk.com/agent/tickets/5783
  • https://posthoghelp.zendesk.com/agent/tickets/5638

Data warehouse

  • https://posthoghelp.zendesk.com/agent/tickets/10511

Sending events

  • https://posthoghelp.zendesk.com/agent/tickets/9650

Thank you for your feature request – we love each and every one!

macobo avatar Jun 03 '21 12:06 macobo

Yes, I would be interested in this too for the bigquery-plugin

weyert avatar Jun 03 '21 15:06 weyert

please ping me when this is done so I can update relevant docs

yakkomajuri avatar Jun 18 '21 15:06 yakkomajuri

made some progress on this - but it is relatively involved and not top priority currently

fuziontech avatar Jun 28 '21 17:06 fuziontech

Another request from slack: https://posthogusers.slack.com/archives/G01JXEDAL22/p1632611486180800

macobo avatar Sep 27 '21 07:09 macobo

+1 from me on this again

yakkomajuri avatar Sep 27 '21 10:09 yakkomajuri

got a request for this 2 weeks ago when I was doing support as well.

mariusandra avatar Sep 27 '21 11:09 mariusandra

Talked to @guidoiaquinti about this and it sounds extremely painful. Something they pushed back on for years at another company because of the problems that would have cropped up because of it. It will basically really limit us in what we can do with our infrastructure.

The alternative we should push people towards is having them push their data to us vs us pulling the data from them. It is better in almost every way.

fuziontech avatar Oct 01 '21 09:10 fuziontech

I'd like to push back on the above.

How painful is "extremely painful"?

This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.

Currently we suggest people to just whitelist all internet access to their Redshift clusters so we can push data there. This is definitely not great. Sure, strong passwords etc. will help you feel safe, but having an IP range to whitelist would be very useful.

@fuziontech

yakkomajuri avatar Oct 01 '21 09:10 yakkomajuri

For the record, I'm not saying we have to do this - just wanted to add the context that the key thing here is exports, which are a key part of our platform.

yakkomajuri avatar Oct 01 '21 09:10 yakkomajuri

Could we reduce the scope of this or rethink this problem? Can we provide a reasonably stable public IP/range for the plugin server particularly?

yakkomajuri avatar Oct 01 '21 09:10 yakkomajuri

This issue has 10 comments. Issues this long are very hard to read or to contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a request for comments in the meta repo or product internal repo

Could we reduce the scope of this or rethink this problem? Can we provide a reasonably stable public IP/range for the plugin server particularly?

This is what the goal would be. Unfortunately there's no easy way to do this even with a relatively large CIDR block. We should push customers who want to load PostHog with their own data to use an export script to push data from their warehouse to us. We should move away as much as possible from pulling data from their warehouse.

This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.

I understand this is for exports, that is specifically what I was referencing here. Importing data from customers using plugins is not the way we should be importing data into PostHog or exporting data from customers. We had a good chat in https://github.com/PostHog/plugin-server/issues/529#issuecomment-900741592 about this. The ideal way for customers to export data from their warehouses into PostHog is for us to provide them with export scripts and examples for how to load PostHog using Airflow, Azkaban, Luigi, or just CRON and our client libraries. This is what we want to support going forward. Connecting to their warehouses is not a solution most companies, especially large, will be comfortable with.

Edit:

This is not about imports, it's primarily about exports. We need access to people's DBs for all the warehouse exports. BQ, Snowflake, Postgres, Redshift, etc.

Ah! yeah we will still need to have a story for ingesting data from PostHog -> Data warehouse using our API. Granting access to these databases is just something that most companies will not give freely even with a restricted CIDR block. As part of the tooling for data teams we should have an easy Library/API for bulk consuming data out of PostHog as well as getting data back into PostHog. It also becomes a moot point as soon as they install helm chart internally since they will be able to hit anything they want to configure from within their private network like some of our clients already have with BigQuery, GCS, and PubSub.

fuziontech avatar Oct 01 '21 09:10 fuziontech

+2 from users who reached out to support today.

paolodamico avatar Mar 02 '22 18:03 paolodamico

Most services that can push or connect to a redshift database let the customers know the source IP address or addresses that the connection will be coming from. This includes but is not limited to ETL provider (Stich data for example) Data platforms (such as looker) and most other kinds. This allows these addresses to be white listed.

It is rather unusual that this is not an option that is currently offered by Posthog.

More advanced offerings also offer connection over SSH or AWS VPC peering. SSH would not work but VPC peering would (or even Cloudflare tunnels which we have not used but as it is wireguard would be an awesome solution)

nthbooth-feedr avatar May 30 '22 14:05 nthbooth-feedr

We're using hosted Posthog and unfortunately this is a big deal for us. Allowing global internet access to a database instance won't make it past compliance review.

c0bra avatar Jul 11 '22 19:07 c0bra

+1 from a customer for whom this is blocking Redshift import. Slack thread.

Twixes avatar Nov 24 '22 14:11 Twixes

For the record, major inconvenience for us as well.

henriklaurentz avatar Mar 30 '23 13:03 henriklaurentz

Another request from a customer (internal zendesk link)

timgl avatar Oct 13 '23 08:10 timgl

Another customer request (internal zendesk link)

bretthoerner avatar Feb 26 '24 23:02 bretthoerner

Big issue for us as well, with our Postgres DB being hosted on GCP

DaivikGoel avatar Mar 13 '24 22:03 DaivikGoel

As of today we have information on our egress IP addresses here (US/EU) which can be used for things like allow-listing connections. 🙌

benjackwhite avatar May 03 '24 10:05 benjackwhite