enterprise_gateway icon indicating copy to clipboard operation
enterprise_gateway copied to clipboard

EC2 instance to EMR connection

Open ggittu opened this issue 3 years ago • 4 comments

We want to experiment the possibility of using the enterprise_gateway for our current setup. We have setup JupyterHub (TLJH) within an EC2 instance. From this TLJH we would like to connect to an EMR spark cluster(Livy enabled).

Can enterprise gateway be leveraged for this scenario? I cant see any blogs which explore this common use case.

ggittu avatar Feb 25 '21 23:02 ggittu

I'm not familiar with EMR, but you'd probably want to implement a process proxy specific for that resource manager. The YarnClusterProcessProxy might be a good reference.

kevin-bates avatar Feb 26 '21 01:02 kevin-bates

EMR is the Managed Spark from AWS, it seems to use YARN under the covers so our YarnClusterProcessProxy should work with any necessary tweaks to the environment configuration:

https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit/

lresende avatar Feb 26 '21 22:02 lresende

Right on - thanks @lresende. Yeah, so as long as EMR uses the Hadoop Yarn REST API, which various searches indicate to be the case, EMR "should just work".

kevin-bates avatar Feb 26 '21 22:02 kevin-bates

Hi @ggittu - did you get anywhere with this using the YarnClusterProcessProxy or building one yourself?

kevin-bates avatar May 20 '22 22:05 kevin-bates