pooch
pooch copied to clipboard
Use paramiko.SSHClient in SFTPDownloader (enables SSH agent authentication)
These are changes I made to SFTPDownloader in order to support using keys from the agent. This seems to work for me, but I'm new to both pooch and paramiko (and I'm a little overwhelmed by the documentation of both projects), so I'm not sure whether I broke anything or maybe this already worked and I just failed to figure out how to do this.
With this change the following is able to use a key that's loaded into my agent to authenticate the download:
import pooch
downloader = pooch.SFTPDownloader(username='user', allow_agent=True)
DATASET = pooch.create(
path=pooch.os_cache("dataset"),
base_url="sftp://hostname.domain/datasets/",
downloader=downloader,
registry = {
"dataset_001.zip": "sha256:62faa9f0b06887acf71d67e5791c039c2009362acb800f8e31bed2cef835481f",
},
)
DATASET.fetch("dataset_001.zip")
I think this works on *nix/MacOS systems. I'm unsure whether paramiko's SSHClient.load_system_host_keys() works on Windows.
💖 Thank you for opening your first pull request in this repository! 💖
A few things to keep in mind:
- Authorship Guidelines: Our rules for giving you credit for your contributions, including authorship on publications and Zenodo archives.
- Code of Conduct: How we expect people to interact in our projects.
⭐ No matter what, we are really grateful that you put in the effort to do this! ⭐
Removed the allow_agent flag. allow_agent=True is paramiko's default so migrating to using the SSHClient class is sufficient to enable SSH agent use. So the example now just looks like this:
import pooch
downloader = pooch.SFTPDownloader(username='user')
DATASET = pooch.create(
path=pooch.os_cache("dataset"),
base_url="sftp://hostname.domain/datasets/",
downloader=downloader,
registry = {
"dataset_001.zip": "sha256:62faa9f0b06887acf71d67e5791c039c2009362acb800f8e31bed2cef835481f",
},
)
DATASET.fetch("dataset_001.zip")
It would be nice if pooch understood URIs such as "sftp://[email protected]:port/datasets/" (it was unexpected that it didn't "just work") but that is not strictly needed for this PR.
Hi @jstorrs thank you for the pull request! Sorry it was left for so long. I'm slowly working though my back log and will get to this as soon as I can.
@jstorrs had a look at this today and the changes seem reasonable but I'm not very familiar with SFTP (this was implemented by someone else). The tests seem to be failing with the use of the SSHClient because of a missing known_hosts entry: https://github.com/fatiando/pooch/actions/runs/6405819568/job/17902218195?pr=368#step:13:498
The Zenodo failures are being addressed in #373 so don't worry about those. Do you think you can get this working with the old tests? I have no idea how we could test the desired behavior (using the keys), though. Any thoughts on this?
Thanks! I understand that error and will figure out how to remedy it. I definitely have a known_hosts file setup and would not have even thought to delete it to test.
Nothing immediately comes to mind about how to test it, though. I'll see if I can find/think of anything.
Okay, I think I have started to trace this error down. Where I am at currently is that I can't figure out how to get both password and private key auth to work with paramiko's SSHClient. I think the problem you were describing is fixed by adding client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
but in the interim I think I've run into bigger problems with the authentication flow in paramiko which makes password auth fail if an SSH agent is available.
To work on this I've extracted the code and the test into this script:
import paramiko
paramiko.util.log_to_file("paramiko.log")
print("Testing Transport")
connection = paramiko.Transport(sock=('test.rebex.net', 22))
connection.connect(username='demo', password='password')
sftp = paramiko.SFTPClient.from_transport(connection)
sftp.get('/pub/example/pocketftp.png', 'transport.png')
connection.close()
print("Testing SSHClient")
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(
hostname='test.rebex.net',
port=22,
username='demo',
password='password',
)
sftp = client.open_sftp()
sftp.get_channel().settimeout = None
sftp.get('/pub/example/pocketftp.png', 'sshclient.png')
client.close()
What seems to happen here is that if an SSH agent is found then SSHClient will fail to authenticate (basically in the log publickey authentication fails but it doesn't continue on to try password authentication). Looking through paramiko's SSHClient the logic of connect() seems to not have that fallback.
It seems like the authentication logic has been a known to be problematic in paramiko for quite some time and there's recently a new way to define the authentication strategy using new AuthStrategy classes (https://docs.paramiko.org/en/3.3/api/auth.html). The documentation of this isn't particularly clear to me at this time. They basically point at fabric's experimental OpenSSHAuthStrategy. I haven't been able to work this part out yet so it's sort of stuck until I can understand better what's going on.
I think emulating OpenSSH's authentication strategy is a probably ideal, but it's also more than we need. But I'm not sure why that's all in fabric rather than paramiko... pooch doesn't depend on fabric and it seems like massive overkill to add fabric just to get publickey+password auth to coexist. There's got to be an easier way to first try publickey and fallback onto password auth... I have to be missing something.
Seems to be similar to https://github.com/paramiko/paramiko/issues/391 see also https://github.com/paramiko/paramiko/issues/387
Hi @jstorrs thanks for keeping us updated! Sounds like you went through quite the rabbit hole for this PR 🙂
You're right that we wouldn't want to pick up another optional dependency for this to work and I'd be very hesitant to host a lot of code related to this in Pooch itself, since that would mean extra burden to maintain (and something I know very little about).
No rush with this and if you do figure out a way, let us know!