usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

FTP batch upload failures

Open jennaj opened this issue 7 years ago • 2 comments

Tracking ticket to share with the community as needed. The corrections are in progress. When done, this ticket will be closed out.

Problem: A few, but not all, datasets queued for FTP upload may fail.

Temporary Workaround: Restart the FTP upload for any that failed.

@nekrut @natefoo

jennaj avatar Jan 26 '18 20:01 jennaj

This is probably resolved, but am retesting (before release) just to make sure. Not really sure if a main problem or a galaxy problem.

jennaj avatar May 16 '18 20:05 jennaj

Is there a description of the issue? Is this a case of the connection timing out while transferring large files? If so, I'll record the issue here since I appear to have not created an issue for it:

From the ProFTPD TLS documentation:

Question: Using FTPS, after uploading a very large file, my next directory listing fails:

  425 Unable to build data connection: Operation not permitted

The TLSLog contains:

  client did not reuse SSL session, rejecting data connection (see the NoSessionReuseRequired TLSOptions parameter)

but I do not want to use that option, and would like to rely on the additional security protection provided by requring SSL session reuse. And my FTPS client is correctly reusing SSL session IDs (as earlier data transfers were working properly). So why is my data transfer failing after the upload of a very large file?

Answer: The answer involves SSL session caching on the server side (i.e. mod_tls), cache timeouts, and session renegotiations.

By default, mod_tls uses OpenSSL's "internal" session cache, which is an in-memory caching of SSL session IDs. And by default, OpenSSL's internal session cache has a cache timeout of 5 minutes; after that amount of time in the internal session cache, a cached SSL session ID is considered stale and is available for reuse.

This means that 5 minutes or more into an FTPS session, even if your FTPS client reused an SSL session ID, the OpenSSL internal session cache will time out that SSL session ID. The next time your FTPS client goes to reuse that session ID for a data transfer, mod_tls won't find it in the OpenSSL internal session cache, and will think that your FTPS client is not reusing the SSL session ID as is required, and fail the transfer.

Fixing this situation requires two parts: a) the ability to change the cache timeout used for the OpenSSL internal session cache, and b) renegotiating the SSL session ID with the FTPS client periodically, to keep the SSL session ID up-to-date in the session cache.

The first part, configuring the session cache timeout for the OpenSSL internal session cache, is only possible in ProFTPD 1.3.4rc2 and later (see Bug#3580). The TLSSessionCache directive was modified to allow a configuration such as:

  TLSSessionCache internal: 1800

(Unfortunately, the ':' after "internal" is necessary.) This configures mod_tls such that the OpenSSL internal session cache uses a cache timeout of 1800 seconds (30 minutes), rather than the default of 300 seconds (5 minutes).

No matter how long you configure the cache timeout, eventually you will have a session which lasts longer than that timeout. Which brings us to the second part of the solution: renegotiating a new SSL session ID periodically, which keeps it fresh in the session cache. The TLSRenegotiate directive is needed for this. For example, the following configuration should address the issue of failed data transfers after very large uploads:

  TLSRenegotiate ctrl 1500 timeout 300
  TLSSessionCache internal: 1800

This tells mod_tls to request a renegotiation of the SSL session on the control channel every 1500 seconds (25 minutes), and to allow 300 seconds (5 minutes) for the client to perform the renegotiation. It also tells mod_tls to cache the SSL session data for 1800 seconds (30 minutes), i.e. longer than the renegotiation time of 1500 seconds.

This way, as long as your client supports renegotiations and is updating the SSL session ID properly for data transfers, when a data transfer is requested, the SSL session ID presented by the client should always be fresh and in the session cache.

I played around with small values of TLSSessionCache and TLSRenegotiate on test.galaxyproject.org to see if I could force renegotiation but it didn't seem to be working for me. However, I have (as of 132071450beb74f53a918d2c9dd090aa9c9d659b) increased the values to the ones specified in the FAQ, so we can see if this has an effect. If nothing else, it should take longer before timing out.

natefoo avatar Jun 04 '18 17:06 natefoo