weibospider icon indicating copy to clipboard operation
weibospider copied to clipboard

user.py 中 script.string 有bug 导致 mysql数据库表user_relation一直是空

Open myrainbowandsky opened this issue 4 years ago • 0 comments

报错bug

[2020-03-25 19:07:12,388: ERROR/ForkPoolWorker-1] Task tasks.user.crawl_follower_fans[23f3c1fd-fc6e-4c5b-b0cc-5d5c6a9ad068] raised unexpected: TypeError('expected string or bytes-like object',)
Traceback (most recent call last):
  File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/wentao/programming/weibospider/tasks/user.py", line 19, in crawl_follower_fans
    rs = get_fans_or_followers_ids(uid, 1, 1)
  File "/home/wentao/programming/weibospider/page_get/user.py", line 159, in get_fans_or_followers_ids
    urls_length = public.get_max_crawl_pages(page)
  File "/home/wentao/programming/weibospider/page_parse/user/public.py", line 223, in get_max_crawl_pages
    m = re.search(pattern, script.string)
  File "/usr/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

user.py 中 script.string 有bug script.string 这里有bug无法判断是哪个类型一会nontype,一会是<class 'bs4.element.NavigableString'> 无论哪种类型都无法用re 模块抓取

 for script in scripts:
        #print('i am in '+dir_path,'script is '+script)
        #print('script.string:',script.string)
        
        print('type pattern',pattern)
        print('pattern', pattern)
        print('type:',type(script.string))
        
        m = re.search(pattern, script.string)


        if m and 'pl.content.followTab.index' in script.string:
            all_info = m.group(1)
            cont = json.loads(all_info).get('html', '')
            soup = BeautifulSoup(cont, 'html.parser')
            pattern = 'uid=(.*?)&'

            if 'pageList' in cont:
                urls2 = soup.find(attrs={'node-type': 'pageList'}).find_all(attrs={
                    'class': 'page S_txt1', 'bpfilter': 'page'})
                length += len(urls2)
    return length

77535265-b1eabc00-6edd-11ea-8659-d8cee8ef510b

myrainbowandsky avatar Mar 26 '20 12:03 myrainbowandsky