pyspider icon indicating copy to clipboard operation
pyspider copied to clipboard

no result ,not run detail_page

Open thimper opened this issue 6 years ago • 10 comments

任务正常在跑,但是没返回结果: image image

image 似乎没有运行 detail_page.

`#!/usr/bin/env python

-- encoding: utf-8 --

Created on 2019-05-19 11:20:28

Project: test

from pyspider.libs.base_handler import *

class Handler(BaseHandler): crawl_config = { 'itag': '2019051901', }

@every(minutes=0,seconds=30)
def on_start(self):
    self.crawl('https://www.baidu.com', callback=self.index_page)

@config(age=10)
def index_page(self, response):
    self.crawl(response.url, callback=self.detail_page)

@config(priority=2)
def detail_page(self, response):
    return {
        "url": response.url,
        "title": response.text,
    }

`

thimper avatar May 19 '19 03:05 thimper

def on_start(self): self.crawl('https://www.baidu.com', callback=self.detail_page)

start 改成上面,就可以看到 result

thimper avatar May 19 '19 03:05 thimper

www.baidu.com crawled in index page, would NOT crawl again in detail page.

binux avatar May 19 '19 04:05 binux

oh, i see,ths.

thimper avatar May 19 '19 05:05 thimper

我有一个需求,就是我请求一个地址,http://www.xx.com?p1=xx&p2=xxx 是直接请求,但是这个时候有可能会失败,需要重新包装一个头部,请求,得到一个 cookie,重新请求,我是只能 Detail_page 里重新请求吗? 我重新在 detail_page 重新请求同样的地址,会生效吗?刚我试了一下,好像还是没有结果

thimper avatar May 19 '19 05:05 thimper

`@every(minutes=0,seconds=30) def on_start(self): self.crawl('https://www.baidu.com', callback=self.index_page)

@config(age=10) def index_page(self, response): self.crawl(response.url, callback=self.detail_page)

@config(priority=2) def detail_page(self, response): if 'xx' in response.text: headers={xx:xx} self.self.crawl("www.baidu.com",headers=headers,callback=self.detail_page) else: insertdb() return { "url": response.url, "title": response.text, }`

thimper avatar May 19 '19 05:05 thimper

大概类似上面的代码

thimper avatar May 19 '19 05:05 thimper

是不是使用itag,大概怎么使用,能请教一下吗

thimper avatar May 19 '19 05:05 thimper

http://docs.pyspider.org/en/latest/About-Tasks/#basis

binux avatar May 19 '19 09:05 binux

thanks,我想请问一个问题,就是任务运行的控制台,能打印出自已写的脚本的调试日志。在 pyspider 控制台可以看到,可以怎么设置吗?谢谢。

thimper avatar May 19 '19 10:05 thimper

还有一个疑问请教 ,Handler 类 . on_start 运行的实例,与回调 index_page,detail_page 是不是都是对应不同 Handle 实例。

thimper avatar May 19 '19 10:05 thimper