xxl-crawler issues

Bump jsoup from 1.11.2 to 1.14.2

Bumps [jsoup](https://github.com/jhy/jsoup) from 1.11.2 to 1.14.2. Release notes Sourced from jsoup's releases. jsoup 1.14.2 Caught by the fuzz! jsoup 1.14.2 is out now, and includes a set of parser bug...

dependabot[bot]

dependencies

请问该项目还维护和更新吗

如题

mackyuqimack

JsoupUtil工具类loadPageSource()方法里Connection没有调用requestBody

JsoupUtil工具类loadPageSource()方法里Connection没有调用requestBody，有的接口要求只能通过Connection.requestBody()传递参数，这种情况下，抓取不到数据。

AlexWang1988

Bump junit from 4.11 to 4.13.1

Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1. Release notes Sourced from junit's releases. JUnit 4.13.1 Please refer to the release notes for details. JUnit 4.13 Please refer to the release notes...

dependabot[bot]

dependencies

支持自定义获取页面urls

当前`JsoupUtil.findLinks(html)`，只支持获取`http`开头的地址，且无法自定义。在`RunData`添加方法，可以让用户自己扩展对`findUrls()`方法的实现。如bilibili视频地址没有`http`前缀: ``

igoso

Bump htmlunit from 2.24 to 2.37.0

Bumps [htmlunit](https://github.com/HtmlUnit/htmlunit) from 2.24 to 2.37.0. Release notes Sourced from htmlunit's releases. HtmlUnit-2.37.0 Bugfixes many js improvements done in Rhino CHROME 79 FF52 removed FF68 added HtmlUnit-2.36.0 Bugfixes many js...

dependabot[bot]

dependencies

xxl-crawler
xxl-crawler copied to clipboard

Metadata

Bump jsoup from 1.11.2 to 1.14.2

请问该项目还维护和更新吗

JsoupUtil工具类loadPageSource()方法里Connection没有调用requestBody

Bump junit from 4.11 to 4.13.1

支持自定义获取页面urls

Bump htmlunit from 2.24 to 2.37.0

使用SeleniumPhantomjsPageLoader后，jsoup解析后document对象中的baseUri为空

connect timeout超时处理

com.xuxueli.crawler.thread.CrawlerThread#processPage问题

请问一下，有登录后再爬取内容的功能吗？

← Metadata

Owner

Metadata

xxl-crawler xxl-crawler copied to clipboard

Metadata

← Metadata

Owner

Metadata

xxl-crawler
xxl-crawler copied to clipboard