smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

No way to have SFTP connections `load_host_keys()` via `transport_params`

Open Kache opened this issue 3 years ago • 4 comments
trafficstars

Problem description

While currently system host keys are loaded: https://github.com/RaRe-Technologies/smart_open/blob/v5.2.1/smart_open/ssh.py#L91

There's currently no way to load_host_keys() for verifying the host. It could be added via transport_params at paramiko client instantiation. For example, something like:

        ssh = _SSH[key] = paramiko.client.SSHClient()
        ssh.load_system_host_keys()
        if 'load_host_keys' in transport_params:
            ssh.load_host_keys(transport_params['load_host_keys'])
        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

Steps/code to reproduce the problem

Currently unable to verify host key using a local known_hosts file.

Versions

Darwin-20.6.0-x86_64-i386-64bit
Python 3.7.6 (default, Nov 24 2021, 00:59:23)
[Clang 13.0.0 (clang-1300.0.29.3)]
smart_open 5.2.1

Checklist

Before you create the issue, please make sure you have:

  • [X] Described the problem clearly
  • [X] Provided a minimal reproducible example, including any required data
  • [X] Provided the version numbers of the relevant software

Kache avatar Aug 16 '22 23:08 Kache

If you're open to accepting a PR, I'd be willing to create a PR based on the example above.

Kache avatar Aug 16 '22 23:08 Kache

if 'load_host_keys' in transport_params:
    ssh.load_host_keys(transport_params['load_host_keys'])

Can't the user do this prior to the smart_open.open call?

mpenkov avatar Aug 21 '22 12:08 mpenkov

Yes, but notice the first line in my example -- that solution involves accessing smart_open's private cache ,_SSH, to "grab" the instance of the paramiko client, which is not a good software engineering practice.

smart_open does not normally (and rightfully, IMO) expose the underlying paramiko client to the user

Kache avatar Aug 22 '22 20:08 Kache

I wonder if there's a better way to do this without having smart_open know all these paramiko details. I don't want to handle more transport parameters than absolutely necessary.

How about:

def ssh_client_init():  # user's code
    client = paramiko.client.SSHClient()
    # additional ssh config goes here
    return client

transport_params = {'ssh_client_init': ssh_client_init}
with smart_open(url, 'rb', transport_params=transport_params) as fin:
    ...

I think it's better to pass a callable instead of the client itself because we can use the callable to create a new client whenever we get disconnected.

If there is no callable passed, then we can use the default client settings, e.g. what is currently being done.

Yet another way is to expose the underlying client. I'm not opposed to that idea, either. Hiding implementation details is a good thing in general, but here it's getting in the way of the user achieving what they want, so it isn't something we have to strictly stick to.

mpenkov avatar Aug 23 '22 02:08 mpenkov