jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

Support HTML redirect (Meta tag refresh) ?

Open sunng87 opened this issue 12 years ago • 2 comments

We just got this kind of redirection when parsing html files. Some web sites doesn't use standard HTTP redirection. Instead, they use browser redirection:

<html>
 <head> 
  <meta http-equiv="Refresh" content="0;URL=http://sports.sina.com.cn/j/2013-11-14/22386885017.shtml" /> 
 </head> 
 <body></body>
</html>

We can check the Refresh meta tag: If the content URL doesn't equal to base URL, just treat it as a redirection. This can be done within HTTPConnection. I found a python command line utility httpie supports this feature. It will be nice to have this feature in Jsoup.

sunng87 avatar Nov 15 '13 10:11 sunng87

I think it'd have to be opt-in optional. Like Connection.followMetaRedirects(true)

jhy avatar Nov 15 '13 17:11 jhy

It does not work for http://baidu.com either. The HTML I got is:

<html>
 <head>
  <meta http-equiv="refresh" content="0;url=http://www.baidu.com/"> 
 </head>
 <body> 
 </body>
</html>

I used Connection.followRedirects(true) in version 1.8.3.

zhuhw avatar Mar 23 '16 18:03 zhuhw