scrapy_doc_chs
scrapy_doc_chs copied to clipboard
在“Following links” 这一节有个小错误
文档的response.urljoin有两个参数, 但是第一个参数是Response类的引用, 不能在类外使用,查看文档后得出此处的正确写法为response.urljoin(href.extract())
以下是文章内容引用:
def parse(self, response):
for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
url = response.urljoin(response.url, href.extract())
yield scrapy.Request(url, callback=self.parse_dir_contents)
class Response(object_ref): def urljoin(self, url): """Join this Response's url with a possible relative url to form an absolute interpretation of the latter.""" return urljoin(self.url, url)