j2ssh-maverick icon indicating copy to clipboard operation
j2ssh-maverick copied to clipboard

scp pushing big files sometimes hangs forever

Open DrVanScott opened this issue 6 years ago • 1 comments

Hi,

I use this library to transfer files sized up to some gigabytes. Sometimes, quite rare, such a transfer doesn't complete and runs forever without any progress. Whenever i do a thread dump in this situation, the output is always as follows:

"MessagePump_Thread-1388" #1399 prio=5 os_prio=0 tid=0x00007ff3f0bf1800 nid=0x2f6b runnable [0x00007ff3d623d000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at com.sshtools.ssh2.TransportProtocol.readWithTimeout(TransportProtocol.java:594)
        at com.sshtools.ssh2.TransportProtocol.readMessage(TransportProtocol.java:747)
        - locked <0x00000005626628d8> (a java.io.DataInputStream)
        at com.sshtools.ssh2.TransportProtocol.nextMessage(TransportProtocol.java:571)
        - locked <0x00000005626628d8> (a java.io.DataInputStream)
        at com.sshtools.ssh.message.SshMessageRouter.blockForMessage(SshMessageRouter.java:297)
        at com.sshtools.ssh.message.SshMessageRouter.access$000(SshMessageRouter.java:44)
        at com.sshtools.ssh.message.SshMessageRouter$MessagePump.run(SshMessageRouter.java:370)

   Locked ownable synchronizers:
        - None
...
"main" #1 prio=5 os_prio=0 tid=0x00007ff3f0008800 nid=0x2c70 in Object.wait() [0x00007ff3f4e26000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at com.sshtools.ssh.message.ThreadSynchronizer.requestBlock(ThreadSynchronizer.java:73)
        - locked <0x00000005626b8750> (a com.sshtools.ssh.message.ThreadSynchronizer)
        at com.sshtools.ssh.message.SshMessageRouter.nextMessage(SshMessageRouter.java:258)
        at com.sshtools.ssh.message.SshMessageStore.nextMessage(SshMessageStore.java:71)
        at com.sshtools.ssh2.Ssh2Channel.processMessages(Ssh2Channel.java:507)
        at com.sshtools.ssh2.Ssh2Channel$ChannelInputStream.read(Ssh2Channel.java:1132)
        - locked <0x00000005626b83b8> (a com.sshtools.ssh2.Ssh2Channel$ChannelInputStream)
        at com.sshtools.ssh2.Ssh2Channel$ChannelInputStream.read(Ssh2Channel.java:1097)
        at com.sshtools.scp.ScpClientIO$ScpEngineIO.waitForResponse(ScpClientIO.java:398)
        at com.sshtools.scp.ScpClient$ScpEngine.writeFileToRemote(ScpClient.java:508)
        at com.sshtools.scp.ScpClient$ScpEngine.access$000(ScpClient.java:401)
        at com.sshtools.scp.ScpClient.putFile(ScpClient.java:184)
        ...

   Locked ownable synchronizers:
        - None

A comparisson of the source and target file in this situation shows: Somewhere in the middle of the target file, a block sized by a multiple of 4k is missing. I observed the following blocksizes: 4k, 8k, 32k. After that block, the content again matches. The target file is always a bit shorter, but not in the same way as the missing block. What i often observe: if a 32k block is missing in the middle of the file, the file is around 55k smaller. The final content of the target file does not match the final content of the source. Maybe this is due to some buffered data not yet written by the sshd?

My test file is about 260MB: $ seq -w 30000000> testfile Using testfiles of size ~3MB or ~30MB i was not able to reproduce this problem. I need to transfer the file repeatedly for several hours to run into the problem.

Currently i was able to reproduce this problem with the following target configurations: cygwin, OpenSSH_7.2p2, OpenSSL 1.0.2g 1 Mar 2016 cygwin, OpenSSH_7.6p1, OpenSSL 1.0.2n 7 Dec 2017 cygwin, OpenSSH_6.6.1p1, OpenSSL 1.0.1i 6 Aug 2014

On the other hand, i was not able to reproduce it on linux, OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8, OpenSSL 1.0.1f 6 Jan 2014 linux, OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013

While it currently might looking like a cygwin problem, couldn't it be a timing/concurrency issue which only occurs due to the limited performance of cygwin?

Please let me know if (and how) i can provide more information.

This is the code i use to perform the scp put:

    public void scp(String local, String remote, String hostname, int port, String username, String keyfile, String passphrase) throws IOException {
        try {
            SshConnector con = SshConnector.createInstance();

            con.getContext().setHostKeyVerification(
                    new DisabledKnownHostsVerification());
            con.getContext().setPreferredPublicKey(Ssh2Context.PUBLIC_KEY_SSHRSA);

            try (SocketTransport t = new SocketTransport(hostname, port)) {
                t.setTcpNoDelay(true);

                Ssh2Client ssh2 = (Ssh2Client) con.connect(t, username, true);
                try {
                    PublicKeyAuthentication pk = new PublicKeyAuthentication();

                    try (FileInputStream fis = new FileInputStream(new File(System.getProperty("user.home") + File.separatorChar + ".ssh" + File.separatorChar + keyfile))) {
                        SshPrivateKeyFile pkfile = SshPrivateKeyFileFactory
                                .parse(fis);

                        SshKeyPair pair;
                        try {
                            if (pkfile.isPassphraseProtected()) {
                                if (passphrase == null) {
                                    throw new RuntimeException("private key is passphrase protected, but passphrase is missing");
                                }
                                pair = pkfile.toKeyPair(passphrase);
                            } else {
                                pair = pkfile.toKeyPair(null);
                            }
                        } catch (InvalidPassphraseException e) {
                            throw new RuntimeException("The causing exception may be due to a wrong passphrase", e);
                        }

                        pk.setPrivateKey(pair.getPrivateKey());
                        pk.setPublicKey(pair.getPublicKey());
                    }

                    if (ssh2.isConnected()) {

                        ssh2.authenticate(pk);

                        if (ssh2.isAuthenticated()) {
                            new ScpClient(ssh2).putFile(local, remote, false, null, false);
                        } else {
                            throw new RuntimeException("Authentication failed!");
                        }
                    } else {
                        throw new RuntimeException("Connection failed!");
                    }
                } finally {
                    ssh2.disconnect();
                }
            }
        } catch (SshException | ChannelOpenException e) {
            if (e.getCause() instanceof IOException) {
                throw (IOException) e.getCause();
            }
            throw new RuntimeException(e.getMessage(), e);
        }
    }

DrVanScott avatar Jun 05 '18 13:06 DrVanScott

I just edited the inital comment, added some test configurations.

I've now enabled debug logging in my test scenario on both sides. Please find it attached. I have shortened it to the relevant (but complete) range on both sides.

EDIT: The log of maverick would have also contained the transferred data. As this would slow down the whole process and also would made the log hard to read, i removed that. There is just a single line not handled by my filter by accident, but it is also indicating EOF has reached ;-)

INFO: Sent 8116 bytes of transport data outgoingSequence=16489 totalBytesSinceKEX=270141565
INFO: Session OUT: 29999105
30000000

INFO: Sending SSH_MSG_CHANNEL_DATA id=0 rid=0 len=1 window=106624

maverick.log.gz sshd.log

DrVanScott avatar Jun 07 '18 13:06 DrVanScott