pyspider puppeteer fetcher does not work

puppeteer fetcher does not work

Open hubitor opened this issue 5 years ago • 1 comments

I'm trying to use the puppeteer fetcher with this script from the examples:

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    def on_start(self):
        self.crawl('http://www.twitch.tv/directory/game/Dota%202',
                   fetch_type='chrome', callback=self.index_page)

    def index_page(self, response):
        return {
            "url": response.url,
            "channels": [{
                "title": x('.title').text(),
                "viewers": x('.info').contents()[2],
                "name": x('.info a').text(),
            } for x in response.doc('.stream.item').items()]
        }

The result is this: {'channels': [], 'url': 'https://www.twitch.tv/directory/game/Dota%202'}

The puppeteer fetcher is supposed to be running since I see this when I start start pyspider: puppeteer fetcher running on port 22222

When I modify the content of the js_script and rerun the script, pyspider it doesn't do anything. It doesn't even give an error if I insert faulty code.

I've already found a related issue: https://github.com/binux/pyspider/issues/902

but it didn't help.

pyspider version: Latest commit ad3ae13
Operating system: Arch Linux
Start up command: ./pyspider

Expected behavior

Get results.

Actual behavior

No results.

How to reproduce

Use latest development version of pyspider.
Use above script
Start pyspider & run the script.

Nov 21 '19 10:11 hubitor

This is wrong: fetch_type='chrome'

correct: fetch_type='puppeteer'

Because, you can find the answer in https://github.com/binux/pyspider/blob/master/pyspider/fetcher/tornado_fetcher.py#L141

elif task.get('fetch', {}).get('fetch_type') in ('puppeteer', ):
    ...

Jan 09 '20 02:01 larrymeng

pyspider pyspider copied to clipboard

puppeteer fetcher does not work

Expected behavior

Actual behavior

How to reproduce

pyspider
pyspider copied to clipboard