gct icon indicating copy to clipboard operation
gct copied to clipboard

HPN-SSH not present in EPEL builds

Open asdorsey opened this issue 6 years ago • 99 comments

Apologies in advance if you guys don't do the builds for EPEL. I have a question that I hope you can answer.

We recently deployed some data transfer nodes on CentOS 7.6 using the (at the time) latest available GCT packages in EPEL. It appears the gsi-openssh-server package that was installed (gsi-openssh-7.4p1-4.el7.x86_64) doesn't include the HPN patch - attempting to enable the feature in /etc/gsissh/sshd_config results in an error from gsisshd and the service refusing to start.

Oct 24 19:09:03 hdtn1 systemd: Starting Cluster Controlled gsisshd...
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config line 47: Deprecated option RSAAuthentication
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config line 56: Deprecated option RhostsRSAAuthentication
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config line 149: Unsupported option DisableUsageStats
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config: line 189: Bad configuration option: HPNDisabled
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config: line 196: Bad configuration option: HPNBufferSize
Oct 24 19:09:03 hdtn1 gsisshd: /etc/gsissh/sshd_config: terminating, 2 bad configuration options
Oct 24 19:09:03 hdtn1 systemd: gsisshd.service: main process exited, code=exited, status=255/n/a
Oct 24 19:09:03 hdtn1 systemd: Failed to start Cluster Controlled gsisshd.
Oct 24 19:09:03 hdtn1 systemd: Unit gsisshd.service entered failed state.
Oct 24 19:09:03 hdtn1 systemd: gsisshd.service failed.

I found #72 that references a discussion about dropping the HPN patch, but I can't find anything stating that this change was definitely made, and on what date that change was implemented.

Do you have any information on when HPN was removed from the EPEL package builds, and/or what was the last version that included HPN support?

asdorsey avatar Oct 24 '19 19:10 asdorsey

@ellert, do you know? I took a brief look through the Fedora Koji but didn't find anything obvious in the changelog or the build logs.

matyasselmeci avatar Nov 02 '19 16:11 matyasselmeci

Looking back at the thread starting with https://mailman.egi.eu/pipermail/discuss/2017-November/000100.html (and the later one starting https://mailman.egi.eu/pipermail/discuss/2018-September/000172.html), it looks like the EPEL version never had the HPN patch, only the globus-toolkit version did. Following the thread: when we started with the gridcf we decided to go with the EPEL versions and therefore also drop the HPN patch for the (now) gct. I have no idea how much work it would be to adapt it to work with the EPEL version.

msalle avatar Nov 03 '19 20:11 msalle

Thanks for the updates.

I've made a half-baked attempt at getting the HPN patch into the GCT gsi-openssh package. I added the HPN-SSH patch for OpenSSH 7.4p1 as the last patch applied and modified the patch to work with the other patches in the source package. It compiles but I get a segfault in cipher-ctr-mt.c when attempting data transfers, so something is broken.

I'm not very experienced in C, so if someone else wants to give it a try I would be grateful. The modified patch is attached.

openssh-7_4_P1-hpn-14.12.modified.diff.txt

asdorsey avatar Nov 04 '19 14:11 asdorsey

Hi, I am the developer of HPN-SSH. One of my old colleagues who is now at NOAA just contacted me today about gsi-openssh. First I want to start by saying that I had let the hpn-ssh patches fall way behind for various budget and life related issues. However, I've seen ported everything up to OpenSSH 8.1p1. I've also fixed some problems with the multithreading aes-ctr cipher, server logging, and few of formatting issues. I've back ported that fix to 7.6p1 - 8.1p1 inclusive.

I'm very interested in ensuring the hpn-ssh remains a part of gis-openssh and would like to help in the process if anyone would like.

I have grabbed the package files for gsissh from the fedora sources and applied my patches. Of course, it's not building because of some issue with LDAP. That said, not even the unpatched version is building because of the same LDAP issues. Probably my environment.

Anyway, I'm happy to answer questions, take feature requests, and deal with bugs. Just let me know what I can do to help out.

rapier1 avatar Nov 12 '19 22:11 rapier1

@adorsey-NOAA, @msalle So it turns out that I was getting wrong set of package files for Fedora. I grabbed the right source RPM (8.1p1 from https://koji.fedoraproject.org/koji/buildinfo?buildID=1403143) this time and was able to apply my patch. It builds and passes all of the regression and unit tests. I haven't tested it for full functionality at this point but I'll hand it over to the people at work who understand globus better than I to test that out shortly. If you want the patch I've attached it below. I've also included the spec file. This will only build against openssl 1.1 due to requirements inherited from libglobus. If you need it for an older version of openssl let me know and I'll do what I can

openssh-8.1p1-hpnssh.patch.txt gsi-openssh.spec.txt

rapier1 avatar Nov 12 '19 23:11 rapier1

Hi, I am the developer of HPN-SSH. One of my old colleagues who is now at NOAA just contacted me today about gsi-openssh. First I want to start by saying that I had let the hpn-ssh patches fall way behind for various budget and life related issues. However, I've seen ported everything up to OpenSSH 8.1p1. I've also fixed some problems with the multithreading aes-ctr cipher, server logging, and few of formatting issues. I've back ported that fix to 7.6p1 - 8.1p1 inclusive.

I'm very interested in ensuring the hpn-ssh remains a part of gis-openssh and would like to help in the process if anyone would like.

That's great news @rapier1 and very welcome!

@adorsey-NOAA, @msalle So it turns out that I was getting wrong set of package files for Fedora. I grabbed the right source RPM (8.1p1 from https://koji.fedoraproject.org/koji/buildinfo?buildID=1403143) this time and was able to apply my patch. It builds and passes all of the regression and unit tests. I haven't tested it for full functionality at this point but I'll hand it over to the people at work who understand globus better than I to test that out shortly. If you want the patch I've attached it below. I've also included the spec file. This will only build against openssl 1.1 due to requirements inherited from libglobus. If you need it for an older version of openssl let me know and I'll do what I can

CentOS 6 and 7 (and I assume RHEL and Scientific Linux 6 and 7) have OpenSSL 1.0.1[...] and 1.0.2[...] and (GSI-)OpenSSH 5.3p1 and 7.4p1 respectively.

~Will the older HPN patches from SourceForge work with these? If yes, they'll most likely lack the fixes for the problems you mentioned above, right? So support for these versions would be very useful, too until theses OSes are EOL. What do you think, would that be possible?~

Ok, just again had a look on SourceForge and the files there have been updated recently. Reading this:

Important News: Versions 14v15 for OpenSSH 7.6 through version 14v18 for OpenSSH 7.8 had bug in the multithreaded AES-CTR code that would cause occasional hangs. We believe we've identified and fixed this problem. If you run into any issues please contact at [email protected]. We can't fix problems we don't know about so we are counting on you.

...on SourceForge I conclude that the above mentioned problems were not affecting older versions of the HPN-Patches (meaning specifically the patches for OpenSSH 5.3p1 and 7.4p1)?

BTW, I'm in the process of creating GSI-OpenSSH packages for SUSE. I started with packages for OpenSUSE Leap 15.0 which uses OpenSSH 7.6p1 and will also try to integrate the HPN-Patches for OpenSSH 7.6p1 there now. Much obliged for providing these.

fscheiner avatar Nov 14 '19 13:11 fscheiner

@rapier1

BTW, I'm in the process of creating GSI-OpenSSH packages for SUSE. I started with packages for OpenSUSE Leap 15.0 which uses OpenSSH 7.6p1 and will also try to integrate the HPN-Patches for OpenSSH 7.6p1 there now. Much obliged for providing these.

Hm, OpenSUSE Leap 15.0 has OpenSSL 1.1.0[...]. Should the patches from https://sourceforge.net/projects/hpnssh/files/Patches/HPN-SSH%2014v15%207.6p1/ then work at all there? The summary on SourceForge says:

Native OpenSSL 1.1 compatibility is included with OpenSSH 7.9 an on. HPN-SSH 14v18 and on are also compatible with OpenSSL 1.0.1.

...so maybe not?

fscheiner avatar Nov 14 '19 16:11 fscheiner

@fscheiner Unfortunately getting versions of OpenSSH before 7.9 to build with OpenSSL 1.1 is a bit of a pain in the ass. I did it for 7.7p1 and 7.6p1 as an exercise but it's a tangled mess of ifdefs. I wouldn't suggest it unless it's an absolute necessity as maintenance is going to be a issue.

Also, it turns out that the SRPM I grabbed from https://kojipkgs.fedoraproject.org//packages/gsi-openssh for 8.1p1 fails when you try to do a globus auth. Note: this is after I applied the hpn-ssh patch so there might be a weird interaction that I'm not understanding. That said, the hpn-ssh patch doesn't touch the buf. Anyway, it crashes in sshbuf.c at

Program terminated with signal 11, Segmentation fault. #0 sshbuf_reset (buf=buf@entry=0x0) at sshbuf.c:176 176 if (buf->readonly || buf->refcount > 1) { (gdb) bt #0 sshbuf_reset (buf=buf@entry=0x0) at sshbuf.c:176 #1 0x0000556dae6c140b in ssh_gssapi_buildmic (b=b@entry=0x0, user=user@entry=0x556dae6e64d4 "", service=0x556dafb63af0 "ssh-connection", context=context@entry=0x556dae6e057a "gssapi-keyex") at gss-genr.c:503 #2 0x0000556dae689182 in userauth_gsskeyex (ssh=) at auth2-gss.c:90 #3 0x0000556dae675c0a in input_userauth_request (type=, seq=, ssh=0x556dafb717c0) at auth2.c:408 #4 0x0000556dae6b97e9 in ssh_dispatch_run (ssh=ssh@entry=0x556dafb717c0, mode=mode@entry=0, done=done@entry=0x556dafb634a0) at dispatch.c:113 #5 0x0000556dae6b9839 in ssh_dispatch_run_fatal (ssh=ssh@entry=0x556dafb717c0, mode=mode@entry=0, done=done@entry=0x556dafb634a0) at dispatch.c:133 #6 0x0000556dae67469d in do_authentication2 (ssh=ssh@entry=0x556dafb717c0) at auth2.c:184 #7 0x0000556dae6640db in main (ac=, av=) at sshd.c:2262

rapier1 avatar Nov 14 '19 17:11 rapier1

@fscheiner

Also, is there a canonical set of patches, source code, srpms, etc that I should focus on? I'm largely focused on providing support to CentOS because that's what my community generally uses. However, I'm game for helping out on any of these but I need a clue as to where to start.

Thanks!

rapier1 avatar Nov 14 '19 17:11 rapier1

@rapier1

@fscheiner Unfortunately getting versions of OpenSSH before 7.9 to build with OpenSSL 1.1 is a bit of a pain in the ass. I did it for 7.7p1 and 7.6p1 as an exercise but it's a tangled mess of ifdefs. I wouldn't suggest it unless it's an absolute necessity as maintenance is going to be a issue.

I understand. Seems like I have to switch to Leap 15.1 (which has OpenSSH 7.9p1) anyhow according to their lifetime page I just checked. It was convenient to start building a GSI-OpenSSH package on Leap 15.0 as I already had a VM running it. I have to look into how to upgrade this installation to Leap 15.1. One question: As the HPN patches for OpenSSH 7.9p1 are shipped from SourceForge as multiple patches, is there some ordering needed when applying them, or can I just concatenate and apply them as single patch?

Also, it turns out that the SRPM I grabbed from https://kojipkgs.fedoraproject.org//packages/gsi-openssh for 8.1p1 fails when you try to do a globus auth.

Just to be sure, with "globus auth" you mean with GSI proxy credential, because there's also a service from Globus called Globus Auth? And what failed, the gsisshd or gsissh? And does the authentication work for the untouched package at all - as it's pretty new?

@fscheiner

Also, is there a canonical set of patches, source code, srpms, etc that I should focus on? I'm largely focused on providing support to CentOS because that's what my community generally uses. However, I'm game for helping out on any of these but I need a clue as to where to start.

Adding @ellert here, as he should know best. AFAICT for EPEL/Fedora there's only the GSI enabling patch that's shipped with the GSI-OpenSSH source RPMS (available from the above mentioned URL for example). I also use this patch for the SUSE packages (with some reordering). I assume with HPN patches available for the currently maintained RHEL 6, 7 and 8 compatible OSes, for the future you only need to follow up on the most current Fedora versions of GSI-OpenSSH which seems to be always based on the most recent OpenSSH version. This should also provide us with patches for the future RHEL compatible OSes, as we should be able to just reuse the HPN patch(es) from Fedora's GSI-OpenSSH that will match the (GSI-)OpenSSH used in future RHEL compatible OSes.

@matyasselmeci @msalle: What's your opinion on that?

fscheiner avatar Nov 15 '19 09:11 fscheiner

Just to be sure, with "globus auth" you mean with GSI proxy credential, because there's also a service from Globus called Globus Auth? And what failed, the gsisshd or gsissh? And does the authentication work for the untouched package at all - as it's pretty new?

I'm helping@rapier1 with testing. The following is just for clarification, as he is working on another approach that I hope to test later today.

CentOS 7.6 RPMs based on the source RPM from https://kojipkgs.fedoraproject.org//packages/gsi-openssh with the HPN-SSH patch added When using GSI proxy credentials to authenticate, gsisshd segfaults. Stack trace follows.

Core was generated by `gsisshd: Adam.Dorsey [ne'.
Program terminated with signal 11, Segmentation fault.
#0  sshbuf_reset (buf=buf@entry=0x0) at sshbuf.c:176
176             if (buf->readonly || buf->refcount > 1) {
(gdb) bt
#0  sshbuf_reset (buf=buf@entry=0x0) at sshbuf.c:176
#1  0x0000556dae6c140b in ssh_gssapi_buildmic (b=b@entry=0x0, user=user@entry=0x556dae6e64d4 "", service=0x556dafb63af0 "ssh-connection",
    context=context@entry=0x556dae6e057a "gssapi-keyex") at gss-genr.c:503
#2  0x0000556dae689182 in userauth_gsskeyex (ssh=<optimized out>) at auth2-gss.c:90
#3  0x0000556dae675c0a in input_userauth_request (type=<optimized out>, seq=<optimized out>, ssh=0x556dafb717c0) at auth2.c:408
#4  0x0000556dae6b97e9 in ssh_dispatch_run (ssh=ssh@entry=0x556dafb717c0, mode=mode@entry=0, done=done@entry=0x556dafb634a0) at dispatch.c:113
#5  0x0000556dae6b9839 in ssh_dispatch_run_fatal (ssh=ssh@entry=0x556dafb717c0, mode=mode@entry=0, done=done@entry=0x556dafb634a0) at dispatch.c:133
#6  0x0000556dae67469d in do_authentication2 (ssh=ssh@entry=0x556dafb717c0) at auth2.c:184
#7  0x0000556dae6640db in main (ac=<optimized out>, av=<optimized out>) at sshd.c:2262

@rapier1 is working on an HPN-SSH patch for a gsi-openssh source RPM from https://kojipkgs.fedoraproject.org//packages/gsi-openssh/7.6p1/5.fc28.1/src/ . I've already built this package on CentOS 7.6 and successfully tested GSI authentication.

asdorsey avatar Nov 15 '19 14:11 asdorsey

@fscheiner @ellert

I'll write more soon but @adorsey-NOAA just told me that he was able to successfully build, deploy, and use the hpn-ssh patches for OpenSSH 7.6p1 under Centos 7. I'm including a link to the rpms and srpm for this. This was built under OpenSSL 1.0.2k. You'll find openssh-7.6p1-hpnssh.patch in the SOURCES directory of the srpm. I apply this patch after all of the other patches as it seemed easiest to do it that way. I'll start working on 7.7p1 and 7.8p1 shortly. As for 7.9p1 and later I'll need to update my globus environment and start poking at what's going on in sshbuf.c. I think that's some sort of weird interaction with the openssh-8.0p1-gssapi-keyex.patch.

Also, I really don't know anything about globus so I am sorry if I use the wrong terms for things at times. As I move forward I expect I'll be picking a lot of this up.

https://www.dropbox.com/sh/odv0rv58x8tgeou/AADyZMqHW77O3ZopdSv96MRca?dl=0

rapier1 avatar Nov 15 '19 17:11 rapier1

We deployed the packages that @rapier1 built for us onto our data transfer nodes earlier this week. This solved a large part of the data transfer performance issues that our users were complaining about.

I (and the users of our data transfer systems) would love to see this work end up in the EPEL packages if that's at all possible. I'm not great at C, but please let me know if I can help in other ways like testing packages on our test environment.

asdorsey avatar Nov 21 '19 20:11 asdorsey

@adorsey-NOAA @rapier1

We deployed the packages that @rapier1 built for us onto our data transfer nodes earlier this week. This solved a large part of the data transfer performance issues that our users were complaining about.

So you and your users were able to successfully test both HPN enabled GSI-OpenSSH server and client. That's good to hear. If at all possible, I'd recommend to use the (GSI-)OpenSSH version of the respective EPEL version for your OS - EPEL7, right? - as that version is still maintained in EPEL and you'll benefit from any (security) updates that are issued - after rebuilding the package.

I (and the users of our data transfer systems) would love to see this work end up in the EPEL packages if that's at all possible. I'm not great at C, but please let me know if I can help in other ways like testing packages on our test environment.

Testing new packages will surely be helpful. I think for EPEL we would need the HPN patches for the respective (GSI-)OpenSSH versions:

  • EPEL6 - (GSI)OpenSSH 5.3p1
  • EPEL7 - (GSI)OpenSSH 7.4p1
  • EPEL8 - (GSI)OpenSSH 7.8p1

I think EPEL7 would be most important ATM.

@rapier1 I'll give that 7.6p1 HPN patches a try in my GSI-OpenSSH package for OpenSUSE Leap 15.0. I'm not yet finished with the GSI-OpenSSH package for Leap 15.1 as it was a lot of work to adapt the patches from Fedora to the OpenSUSE version of OpenSSH - though a manually built gsissh client from the adapted OpenSSH source from Leap 15.1 already works.

fscheiner avatar Nov 21 '19 21:11 fscheiner

@fscheiner I'll work on EPL7. Mostly I picked 7.6p1 out of hat (and because I don't think it included the OpenSSL 1.1 patch). As I develop new ones I'll put them up at https://sourceforge.net/projects/hpnssh/. I'm out all next week but I'll try to get to 7.4 tomorrow.

rapier1 avatar Nov 21 '19 22:11 rapier1

@fscheiner Turns out that building for EPEL7 (7.4p1) really didn't take much time at all. I was able to get it to patch and compile but I don't have the place to do the functional testing right now. As soon as I have it tested I can pass the srpm to whoever you think best.

I may want to back port some of the changes I made to the multithreaded cipher as well but this should be a good start.

rapier1 avatar Nov 21 '19 22:11 rapier1

@rapier1 Small update from my side - don't feel pressed to answer before you're back :-):

I patched the openSUSE Leap 15.1 OpenSSH 7.9p1 package with GSI and HPN patches. The resulting (gsi)ssh binary works to connect to a GSI enabled sshd.

For HPN I used https://sourceforge.net/projects/hpnssh/files/Patches/HPN-SSH%2014v18%207.9p1/openssh-7_9_P1-hpn-14.18.diff/download. From looking into the RPM source package on https://www.dropbox.com/sh/odv0rv58x8tgeou/AADyZMqHW77O3ZopdSv96MRca?dl=0 this patch came close to the one you used. But as there are other patches as well and all together are much bigger than the one I used, I'd like to ask if that was the correct patch to use?

And is there a quick way to determine that the HPN features are working? From:

[...]
The HPN-SSH team (Ben Bennet and Mike Tasota) also developed a multi-threaded variant of the
AES-CTR cipher so as to allow multicored systems to distribute the burden of computing the 
keystream over multiple cores. This enhancement produces a cipher stream that is 
indistinguishable from the default AES-CTR cipher stream. The upshot of this being that it is 
backwards compliant with all existing AES-CTR implementations - no need to have the 
multithreaded variant on both sides of the connection. [...]

..I assume it's sufficient to have an HPN enabled client to at least test the multithreaded AES-CTR cipher.

fscheiner avatar Nov 28 '19 17:11 fscheiner

@fscheiner Hey, I'm sorry it's taken so long to get back to you. Things have been hectic (and infected by my kids with creeping crud). As for which patch to use - I'd use the one from sourceforge. The one in dropbox may or may not be somewhat out of date as I'm not monitoring that one.

There are multiple patches on sourceforge so I should probably explain that each of them implements a different set of features. So the AES-CTR only provides the multithreaded cipher, the Server Logging patch only includes the extended server logging, etc. The one you downloaded incorporates all available features in to a single patch, so that include the non cipher switch, the dynamic window scaling, server logging, multithreaded aes-ctr, and a patch to the scp progressmeter to show the 1 sec throughput as well as the averaged throughput. So if you want to include everything just use the patch you used.

As for functional testing - I keep meaning to build a script for that. Anyway, you can test the multithreaded cipher from the client side since the outbound encrypted data is the identical to the nonthreaded aes-ctr cipher. If you increase the verbosity you should get a quick rundown on the number of hits and waits in the keystream threads. Likewise if the verbosity is increased even more you'll see receive buffer adjustments - that increases pretty quickly though so you it will help to be on a high BDP path. To test the none-switch you do need to have that enabled on both ends of the connection. You'll get a warning once the none cipher is engaged but the only way to really test it is to do a raw packet capture and see if you can read the payload.

Also, I should have a version of 7.4 for CentOS 7 up shortly. I am having some problems with building 7.8 for CentOS 7 but I'm not going to spend a lot of time on that. Once I have a functional CentOS 8 environment I'll start working on 7.8 there.

A lot of the delay is just getting the environments set up. I have a coworker who generally does that but between SC19 and TechEx and Thanksgiving and etc he's been a little oversubscribed.

Lastly, is there anyone else I should be talking to about hpn-ssh? I'd like to do whatever is necessary to maintain its value for the community.

rapier1 avatar Dec 09 '19 21:12 rapier1

@rapier1

@fscheiner Hey, I'm sorry it's taken so long to get back to you. Things have been hectic (and infected by my kids with creeping crud)

No issue at all. I'm just happy that you found the time to continue the work on the HPN patches. That's greatly appreciated. :-)

There are multiple patches on sourceforge so I should probably explain that each of them implements a different set of features. So the AES-CTR only provides the multithreaded cipher, the Server Logging patch only includes the extended server logging, etc. The one you downloaded incorporates all available features in to a single patch, so that include the non cipher switch, the dynamic window scaling, server logging, multithreaded aes-ctr, and a patch to the scp progressmeter to show the 1 sec throughput as well as the averaged throughput. So if you want to include everything just use the patch you used.

Thank you, that's useful information. I also made progress with packages for SUSE: Lightly tested packages including the HPN and GSI patches for OpenSSH 7.9p1 are now available for SLES15 (SP1) and OpenSUSE Leap 15.1 (https://build.opensuse.org/package/show/home:frank_scheiner:gct/gsi-openssh).

As for functional testing - I keep meaning to build a script for that. Anyway, you can test the multithreaded cipher from the client side since the outbound encrypted data is the identical to the nonthreaded aes-ctr cipher. If you increase the verbosity you should get a quick rundown on the number of hits and waits in the keystream threads. Likewise if the verbosity is increased even more you'll see receive buffer adjustments - that increases pretty quickly though so you it will help to be on a high BDP path. To test the none-switch you do need to have that enabled on both ends of the connection. You'll get a warning once the none cipher is engaged but the only way to really test it is to do a raw packet capture and see if you can read the payload.

Ok, I'll have a look into this.

Also, I should have a version of 7.4 for CentOS 7 up shortly. I am having some problems with building 7.8 for CentOS 7 but I'm not going to spend a lot of time on that. Once I have a functional CentOS 8 environment I'll start working on 7.8 there.

Yeah, supporting a 7.8 on CentOS 7 would anyhow require constant "backporting" of patches from CentOS 8 instead of just using the patches for the 7.4 maintained in CentOS 7.

Lastly, is there anyone else I should be talking to about hpn-ssh? I'd like to do whatever is necessary to maintain its value for the community.

I don't know of any specific party, but maybe a mail to our [email protected] mailing list would possibly reach a good portion of interested people. Posts to the list are moderated until you subscribe (details on gridcf.org, archive on https://mailman.egi.eu/pipermail/discuss/). I can approve both. Because of the upcoming holidays, you maybe should not post before after the holidays. BTW, nice holiday time and - if we don't hear from each other earlier - until next year. :-)

fscheiner avatar Dec 19 '19 10:12 fscheiner

I hope everyone is doing well currently with the ongoing chaos.

I was wondering if there had been any progress on this issue, and if there was anything I could do to help.

EDIT: @rapier1 We've just noticed an issue with the test set of packages that you built for us a while ago. We're seeing messages like the following:

Apr  6 10:03:20 hdtn1 gsisshd[375932]: Disconnected from P\211I{\376\177

It looks like the IP that's supposed to be in that message is malformed somehow. Please let me know if I can provide more information.

asdorsey avatar Apr 07 '20 17:04 asdorsey

I don't think that's specifically from the hpn-ssh code but I can give it a look and see what might be happening. My guess is that there is a malformed printf somewhere.

rapier1 avatar Apr 09 '20 20:04 rapier1

Hi, this might be tricky to debug. I just had a look and Disconnected from probably comes from https://github.com/openssh/openssh-portable/blob/master/packet.c#L1867. In older version that was part of sshpkt_fatal() itself, e.g. in the 7.5, packet.c#L2124. I tried tracing back to where it goes wrong, but I can't see an obvious problem: remote_id traces via fmt_connection_id(), probably ssh_remote_ipaddr(), get_peer_ipaddr() to get_socket_address() in canohost.c#L68. It's all part of upstream openssh, but that doesn't mean that the corrupted data is also caused by upstream.

msalle avatar Jun 22 '20 17:06 msalle

Sorry I didn't get back to you sooner. I let this slide off of my plate because I wasn't able to replicate the problem on my end - which doesn't mean it's not real. How often do you run into this issue? Is it every time? Frequently? Occasionally? Does it happen with a specific remote host or have you seen it on multiple hosts? Lastly, do you know if this is via IPV4 or IPv6?

As an aside - I got new funding to develop hpn-ssh so I will have more time to focus on these issues and roll out improvements.

rapier1 avatar Jun 22 '20 17:06 rapier1

The issue occurs every time a user disconnects from a DTN, so every time that "DIsconnected from..." message is printed, the output is garbled. The DTNs are IPv4 only.

This is happening with the gsi-openssh-7.6p1-5 RPMs you gave me in November 2019. If you have a different or newer set of RPMs, I could test those on our test system and see if they have this issue as well. You mentioned in an earlier comment on this issue that you had packages based on OpenSSH 7.4 as well.

asdorsey avatar Jun 23 '20 13:06 asdorsey

Let me go back and check something. I was talking to the gsi people and I think they picked up the hpn-ssh stuff again but let me see what I can find. Is this for CentOS?

rapier1 avatar Jun 23 '20 20:06 rapier1

Yes, we're running CentOS 7.7 on the DTNs.

asdorsey avatar Jun 23 '20 20:06 asdorsey

Let me go back and check something. I was talking to the gsi people and I think they picked up the hpn-ssh stuff again but let me see what I can find. Is this for CentOS?

Sorry @rapier1, the HPN patches weren't yet included in the GSI-OpenSSH packages for EPEL and Fedora AFAICT. I am currently re-enabling them for SUSE packages and need to test them afterwards. An issue with the GSI patch and missing time kept me from continuing this earlier. I'll then see into how the HPN patches can be included in the GSI-OpenSSH packages for EPEL and Fedora - if possible.

@ellert, @matyasselmeci, @msalle: Back in December 2019 I successfully tested sending and receiving file data with multithreaded AES-CTR enabled by using a gsiscp from a GSI-OpenSSH 7.9p1 package w/HPN patches for openSUSE Leap 15.1 against an HPN enabled gsisshd at a remote site. So the client already worked properly for me. I haven't yet tested the server - I don't remember exactly why, but most likely because of the - since then solved - issue with the GSI patch which prevented connections to gsisshd's from 7.8p1 and up. So far I didn't see any issues during inclusion of the HPN patches into GSI-OpenSSH 7.6p1 and 7.9p1 for SUSE, just a few reorderings were needed. I included the HPN patch on top of the GSI patch and all the other SUSE patches for the SUSE packages.

So what do you think about starting to include the HPN patches in the EPEL/Fedora packages?

fscheiner avatar Jun 26 '20 16:06 fscheiner

If we (you (-; ) can test them and show them to work, then I certainly think it would be a very valuable addition. I personally probably won't have many cycles for it the coming month.

msalle avatar Jun 26 '20 16:06 msalle

I would be very happy to test any packages that you guys can build for me. I have test DTNs that I can modify as needed.

asdorsey avatar Jun 26 '20 16:06 asdorsey

@msalle

If we (you (-; ) can test them and show them to work, then I certainly think it would be a very valuable addition. I personally probably won't have many cycles for it the coming month.

Sure :-), but it will take me some time.

@adorsey-NOAA

I would be very happy to test any packages that you guys can build for me. I have test DTNs that I can modify as needed.

Ok, great, then I could spread the load of testing.

fscheiner avatar Jun 26 '20 16:06 fscheiner