jsoup
jsoup copied to clipboard
Support HTML redirect (Meta tag refresh) ?
We just got this kind of redirection when parsing html files. Some web sites doesn't use standard HTTP redirection. Instead, they use browser redirection:
<html>
<head>
<meta http-equiv="Refresh" content="0;URL=http://sports.sina.com.cn/j/2013-11-14/22386885017.shtml" />
</head>
<body></body>
</html>
We can check the Refresh meta tag: If the content URL doesn't equal to base URL, just treat it as a redirection. This can be done within HTTPConnection. I found a python command line utility httpie supports this feature. It will be nice to have this feature in Jsoup.
I think it'd have to be opt-in optional. Like Connection.followMetaRedirects(true)
It does not work for http://baidu.com either. The HTML I got is:
<html>
<head>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</head>
<body>
</body>
</html>
I used Connection.followRedirects(true) in version 1.8.3.