irssi-scripts icon indicating copy to clipboard operation
irssi-scripts copied to clipboard

Don't "resolve" URIs not listed in longify-urls.list

Open duckdalbe opened this issue 14 years ago • 2 comments

First: Thank you for this script! It really helps me a lot in coping with these stupid "short URLs" on wtitter and the like. Unfortunately I don't know perl well enough to fix the following problem myself, so I'm posting it here.

Currently longify-urls.pl seemingly also "resolves" URLs not listed in longify-urls.list:

http://t.co/AgHfYlq is being resolved to /artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html whilte the actual Location-Header sent by t.co says http://www.faz.net/-025ATJ while faz.net ist not listed in longify-urls.list: % grep -q faz.net ~/.irssi/longify-urls.list; echo $? 1

(Also this shows that longify-urls.pl doesn't handle Location-headers starting with a slash correctly. It should prepend the known hostname.)

Could you have a look at this?

Thanks!

duckdalbe avatar Sep 14 '11 06:09 duckdalbe

The problem is that I'm just taking the last Location: header in the chain, and hoping it's good enough. I guess the proper way to do things is to dig down into LWP::UA and use $ua->simple_request and manually follow the redirect chain.

I'm not sure if it's necessarily a bug that an intermediate step in that chain doesn't match the whitelist, unless you can think of a reason why it should? For example, with your t.co link, we have:

  1. http://t.co/AgHfYlq -> http://www.faz.net/-025ATJ
  2. http://www.faz.net/-025ATJ -> http://www.faz.net//artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html

Should the output here be the -025SATJ url, or the terminal redirection?

Part of resolving this should solve the 2nd part, of making sure URLs get canonicalised as well.

shabble avatar Sep 20 '11 23:09 shabble

Personally I'd prefer the output of longify-urls to be http://www.faz.net/-025ATJ.

But the real issue is the missing hostnames (2nd part). If you feel you can solve the canonicalization more easily without stepping through the redirect chain I'd be way happier than today, too!

duckdalbe avatar Sep 21 '11 08:09 duckdalbe