mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

ssh tunnels without job runners

Open coyotemarin opened this issue 13 years ago • 1 comments

When people use the same job flow for several jobs, they like to be able to just leave the same SSH tunnel open. Currently, ssh tunnels are tied to runners, so once a job finishes, the SSH tunnel goes away.

We should probably have a function in mrjob.ssh that can create an SSH tunnel to a given job flow. It should probably return an object with an __exit__ method, so you can do:

with ssh_tunnel_to(job_flow_id):
    ...

I'm not sure the best way to pass parameters to this function. It needs to know:

  • EMR connection settings (probably could just take an EmrConnection object
  • path to .pem file
  • ssh binary
  • a range of ports that we can listen on locally
  • whether the SSH tunnel is open

All but the first two arguments can be defaulted. It might make sense to have one method in EMRJobRunner that can create an SSH tunnel with no arguments, and another function in mrjob.ssh that takes these arguments.

coyotemarin avatar Feb 01 '12 18:02 coyotemarin

Maybe as part of the mrjob ssh subcommand? (see #1113)

mrjob at least attempts to use the same port number on any given cluster by using the cluster ID as a seed for the random number generator (see #67).

coyotemarin avatar Sep 08 '18 01:09 coyotemarin