ftr-site-config icon indicating copy to clipboard operation
ftr-site-config copied to clipboard

How to get content from a site with bad ssl cert

Open HolgerAusB opened this issue 1 year ago • 2 comments

I want to write a site config for forums.lotro.com but I can't find the right XPath for body and title and author. But for me it seems, that something different is the problem. While the feed itself is allready fulltext I neet the config for wallabag which says 'can't retrieve contents for this article' when trying to send a link of an article.

I tried to build the config for FTR but nothing of the following will match, while videlibri finds matches with my XPath. Here are several examples, while I hat only one title or body per test, of course:

http_header(user-agent): Mozilla/5.0 (Windows NT 10.0; rv:103.0) Gecko/20100101 Firefox/103.0
http_header(referer): https://forums.lotro.com/

prune: no
tidy: no

author: //a[contains(@class,'username')]

title: //title
title: /html/head/title

body: //blockquote
body: //blockquote[normalize-space(@class) = 'postcontent restore']
body: //blockquote[contains(concat(' ',normalize-space(@class),' '),' postcontent restore ')]
body: //div[@class='content']/div[1]/blockquote

test_url: https://forums.lotro.com/forums/showthread.php?695978-Bullroarer-Update-33-2-Beta-2-OPEN

HolgerAusB avatar Sep 10 '22 12:09 HolgerAusB

moved the conversation about this to the community-forum, seems to be the better place for it.

HolgerAusB avatar Sep 15 '22 05:09 HolgerAusB

The issue is on the TLS-cert of forums.lotro.com or there webserver-software After changing the scurity level from 2 to 1 in last entry of /etc/ssl/openssl.cnf it works.

@fivefilters / @j0k3r : Should I PR a forums.lotro.com.txt anyway? Containing a comment what to do? Or is there any parameter for site-config to lower the ssl-security for just this site?

EDIT: Would be very nice to have an option in site-config: ignore-TLS-error: yes or ignnore-SSL-error: yes or check-SSL-cert: no

HolgerAusB avatar Sep 15 '22 10:09 HolgerAusB

Hi @HolgerAusB, reponded on the forum, but will post here too:

Not keen to add things like this at the site config level. In our experience these are not very common server issues. When they do occur, they’re usually either related to the remote server or the server making the request. If it’s the latter, there’s probably something that needs to be fixed or updated. If it’s the remote server, probably best to try to contact them to fix the issue.

The reason we implemented the fix I mentioned was because it was a known OpenSSL issue at the time affecting lots of users.

But I’m not sure what you’ve reported here is the same. If I try using Full-Text RSS on our servers, the content can be retrieved without any cipher changes.

fivefilters avatar Sep 26 '22 23:09 fivefilters

Strange. It does not work for me without customization. Maybe your server has different settings for OpenSSL.

I have tried it again from another computer:

user@raspi:~ $ curl -v https://forums.lotro.com/forums/showthread.php?695978-Bullroarer-Update-33-2-Beta-2-OPEN
*   Trying 198.252.160.58:443...
* Connected to forums.lotro.com (198.252.160.58) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (OUT), TLS alert, handshake failure (552):
* error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small
* Closing connection 0
curl: (35) error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small

And Yes, you are right. The problem is on lotro.com's side.

However, they are probably not interested in implementing a fix that only benefits a content crawler. Because Internet browsers works with the server without complaining about the certificate.

I understand that you don't want to put this in at site-config level. But for us self-hosters it means to reduce the security level for all OpenSSL communication and not only for a few hosts.

HolgerAusB avatar Sep 27 '22 13:09 HolgerAusB