kramdown-rfc icon indicating copy to clipboard operation
kramdown-rfc copied to clipboard

Persistent connections for reference fetching

Open martinthomson opened this issue 4 years ago • 14 comments

I just had a build fail due to a DNS error after successfully fetching resources from the very same server.

/github/home/.cache/xml2rfc/reference.RFC.7838.xml: fetching from https://www.rfc-editor.org/refs/bibxml/reference.RFC.7838.xml
/github/home/.cache/xml2rfc/reference.RFC.6265.xml: fetching from https://www.rfc-editor.org/refs/bibxml/reference.RFC.6265.xml
*** Failed to open TCP connection to www.rfc-editor.org:443 (getaddrinfo: Try again) while fetching https://www.rfc-editor.org/refs/bibxml/reference.RFC.6265.xml

This would not happen if the HTTP client were able to reuse connections. I realize that this is work, but it would make the tool considerably faster.

martinthomson avatar Feb 16 '21 04:02 martinthomson

There are two somewhat conflicting potential directions of improvement:

  • Persistent connections (reuse connection for sequential retrievals)
  • Parallel connections (don't wait for a single retrieval to complete until the whole process completes)

Parallel connections would help with slow services such as the DOI service. Persistent connections more with the faster services such as rfc-editor.

This is all complicated by the desire to support older Ruby versions, as well as issues many Ruby installations have with their ca_certs.

So, yes, this is a bit of work (requiring some exploration in the cross product of servers, systems, and versions). Probably won't complete before IETF110 deadline...

cabo avatar Feb 16 '21 07:02 cabo

I hacked something together as aa0ef89, with only rough testing so far. Anecdotically reduces fetch time in one of my drafts from 6.5 to 4 s, so it's not a breakthrough but worth a little experimentation. Since I don't want to break older platforms, some explicit opt-in actions are required to use this, see copy of commit comments below.

1.3.32: Experimental persistent connections

to use this:

  • gem install net-http-persistent (this is not an enforced dependency of the gem!)
  • Set environment variable KRAMDOWN_PERSISTENT to any value

KRAMDOWN_PERSISTENT= kdrfc -3chi foo.md

cabo avatar Feb 16 '21 23:02 cabo

Thanks Carsten. I'm giving this a go right now. The DNS errors on rfc-editor.org are at the point of more or less forcing my hand.

martinthomson avatar Feb 17 '21 22:02 martinthomson

I can confirm that this works. And it seems to work very well; it looks like it more than halved the build time in CI for a pair of drafts I'm working on. You can probably close this; unless you wanted to use this to track parallel fetching.

martinthomson avatar Feb 18 '21 00:02 martinthomson

Did you want to keep this open to track parallel downloads? I'm happy enough with the performance bump from persistent connections (and make -j$(nproc) when I'm in a hurry, of course).

martinthomson avatar May 03 '21 07:05 martinthomson

I think the remaining step is to make this the default.

cabo avatar May 03 '21 07:05 cabo

A little bit of sleuthing around in different platform versions needed.

cabo avatar May 03 '21 07:05 cabo

I'm getting the following error, which I'm guessing in related:

Can't set up persistent HTTP -- cannot load such file -- net/http/persistent

Gem installed this version, but didn't appear to pull in any new dependencies; is there one it should have declared for this?

MikeBishop avatar Jun 09 '21 14:06 MikeBishop

I don't know what "this version" is above. The message is a warning, kramdown-rfc will continue, just slower. To even get this message you must have set KRAMDOWN_PERSISTENT in your environment (without that, kramdown-rfc always uses the slower method). You may want to try gem install net-http-persistent to get rid of the warning message and use the faster method.

cabo avatar Jun 09 '21 14:06 cabo

Ah -- I see it in the commit notes. Perhaps that should be a declared dependency if it's now the default.

MikeBishop avatar Jun 09 '21 14:06 MikeBishop

It is not the default -- you have to set the environment variable. It cannot be the default as long as we still support the Ruby 2.3 version installed on macOS High Sierra. (Maybe we should make the cut with Monterey... let's see.)

cabo avatar Jun 09 '21 14:06 cabo

Ah -- the variable is set by https://github.com/martinthomson/i-d-template. @martinthomson, that seems sub-optimal if it requires a dependency that isn't installed by default.

MikeBishop avatar Jun 09 '21 14:06 MikeBishop

At least, I have to fix the warning message.

cabo avatar Jun 09 '21 14:06 cabo

I considered making the setting of the variable conditional, but then the problem wouldn't be discoverable. And the performance gain is significant. A different warning message would be good. Something like "Not using persistent HTTP: gem install net-http-persistent to silence this message" would be nice.

martinthomson avatar Jun 09 '21 23:06 martinthomson