xxl-crawler icon indicating copy to clipboard operation
xxl-crawler copied to clipboard

[issue] 多线程情况下,tryFinish()很小的概率会误判当前运行状态

Open 1988tianyuan opened this issue 5 years ago • 1 comments

  • issue description

多线程情况下,tryFinish()会误判CrawlerThread的运行状态,导致提前stop,以下是运行XxlCrawlerTest,开启3个thread,并打印日志: image

概率比较小,大概试10次能出现一次,原因可能如下: thread-3调用tryFinish()并提前获取了3个CrawlerThread的isRunning状态均为false,刚好此时thread-1调用了crawler.getRunData().getUrl()并将running设为true(但thread-3已经无法知晓),最后thread-3判断runData.getUrlNum()==0为true,由此isEnd为true,导致了误判: image

  • solution
  1. 改写tryFinish(),先判断runData.getUrlNum()==0,再逐一获取CrawlerThread的状态,防止调用crawler.getRunData().getUrl()无法获取running的最新状态:
public void tryFinish(){
    boolean isEnd = runData.getUrlNum()==0;
    boolean isRunning = false;
    for (CrawlerThread crawlerThread: crawlerThreads) {
        if (crawlerThread.isRunning()) {
            isRunning = true;
            break;
        }
    }
    isEnd = isEnd && !isRunning;
    if (isEnd) {
        logger.info(">>>>>>>>>>> xxl crawler is finished.");
        stop();
    }
}
  1. CrawlerThread的running参数加上volatile关键字,保证可见性:
private volatile boolean running;

1988tianyuan avatar Jun 26 '19 09:06 1988tianyuan

后续测试发现即使做了上述修改,极低的概率还是会出问题,猜测是CrawlerThread在调用完crawler.getRunData().getUrl();后时间片到期,没有来得及running = true;,导致其他线程继续误判状态 这样的话只能改进整个tryFinish()的逻辑流程了

1988tianyuan avatar Jun 26 '19 09:06 1988tianyuan