enterprise_gateway
enterprise_gateway copied to clipboard
EC2 instance to EMR connection
We want to experiment the possibility of using the enterprise_gateway for our current setup. We have setup JupyterHub (TLJH) within an EC2 instance. From this TLJH we would like to connect to an EMR spark cluster(Livy enabled).
Can enterprise gateway be leveraged for this scenario? I cant see any blogs which explore this common use case.
I'm not familiar with EMR, but you'd probably want to implement a process proxy specific for that resource manager. The YarnClusterProcessProxy might be a good reference.
EMR is the Managed Spark from AWS, it seems to use YARN under the covers so our YarnClusterProcessProxy should work with any necessary tweaks to the environment configuration:
https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit/
Right on - thanks @lresende. Yeah, so as long as EMR uses the Hadoop Yarn REST API, which various searches indicate to be the case, EMR "should just work".
Hi @ggittu - did you get anywhere with this using the YarnClusterProcessProxy
or building one yourself?