pyspider
pyspider copied to clipboard
puppeteer fetcher does not work
I'm trying to use the puppeteer fetcher with this script from the examples:
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
def on_start(self):
self.crawl('http://www.twitch.tv/directory/game/Dota%202',
fetch_type='chrome', callback=self.index_page)
def index_page(self, response):
return {
"url": response.url,
"channels": [{
"title": x('.title').text(),
"viewers": x('.info').contents()[2],
"name": x('.info a').text(),
} for x in response.doc('.stream.item').items()]
}
The result is this:
{'channels': [], 'url': 'https://www.twitch.tv/directory/game/Dota%202'}
The puppeteer fetcher is supposed to be running since I see this when I start start pyspider:
puppeteer fetcher running on port 22222
When I modify the content of the js_script and rerun the script, pyspider it doesn't do anything. It doesn't even give an error if I insert faulty code.
I've already found a related issue: https://github.com/binux/pyspider/issues/902
but it didn't help.
- pyspider version: Latest commit ad3ae13
- Operating system: Arch Linux
- Start up command: ./pyspider
Expected behavior
Get results.
Actual behavior
No results.
How to reproduce
- Use latest development version of pyspider.
- Use above script
- Start pyspider & run the script.
This is wrong:
fetch_type='chrome'
correct:
fetch_type='puppeteer'
Because, you can find the answer in https://github.com/binux/pyspider/blob/master/pyspider/fetcher/tornado_fetcher.py#L141
elif task.get('fetch', {}).get('fetch_type') in ('puppeteer', ):
...