scrapyd icon indicating copy to clipboard operation
scrapyd copied to clipboard

scrapy job is still in the running list after the job process has exited.

Open newhandLiu opened this issue 7 years ago • 8 comments

when the job exited, it's still in the running list which is returned from listjobs.json. However, it shows the process has exited when i request the cancel.json.

PS: I used webdriver in this scrapy job. Python 2.7 Scrapy 1.1.2 Scrapyd 1.1

newhandLiu avatar Feb 22 '17 06:02 newhandLiu

Hi @newhandLiu

Don't you need to explicitly shutdown the driver when the spider closes? If you define a close(self, reason) method in your spider you'll have a chance to close the webdriver there (unless the spider __init__ crashes after initializing the webdriver)

If this doesn't solve the problem please share as much code as you can here.

Also, I'm curious if the PID shown in the website is still valid (after the spider finishes)

Digenis avatar Feb 28 '17 14:02 Digenis

Hi @Digenis

Thanks for your reply. I've closed the webdriver in spider close function as follows.

    def __init__(self, *args, **kwargs):
        # webkit driver
        self.driver = webdriver.PhantomJS(executable_path=PHANTOMJS, service_log_path='/tmp/ghostdriver.log')
        self.driver.implicitly_wait(1)
        self.driver.set_page_load_timeout(3)
        
    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(NewsDuowanSpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_closed, signal=scrapy.signals.spider_closed)
        return spider


    def spider_closed(self, spider):
        spider.logger.info('Spider closed: %s', spider.name)
        spider.driver.quit()

And the pid isn't exist in server after the spider finishes.

newhandLiu avatar Mar 09 '17 10:03 newhandLiu

Hi,

I edited your comment to correct the markdown syntax.

You don't need to use from_crawler() at all. Just define a spider method closed(self, reason) and it will be automatically called when the spider is about to close.

Is the above code sample what you tried originally or after my comment? If it's after my comment, does it work? What does it print in the log? Please, also share here the output of cancel.json.

This is probably not a scrapyd bug, see https://github.com/seleniumhq/selenium/issues/767, but we should investigate what we can do on the scrapyd side.

Digenis avatar Mar 09 '17 11:03 Digenis

Hi @Digenis

yes, the above code sample is what i tried originally.

The listjobs.json info: {"status": "ok", "running": [ {"start_time": "2017-03-16 01:01:53.237206", "id": "dc155b3e09a011e789ba8cdcd4b48645", "spider": "nga"},] }

the job was showed as running status, which actually has exit. then i requested the cancel.json, it showed:

Traceback (most recent call last):
  File "/data/home/user00/python2.7/lib/python2.7/site-packages/scrapyd/webservice.py", line 17, in render
    return JsonResource.render(self, txrequest)
  File "/data/home/user00/python2.7/lib/python2.7/site-packages/scrapyd/utils.py", line 19, in render
    r = resource.Resource.render(self, txrequest)
  File "/data/home/user00/python2.7/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/web/resource.py", line 250, in render
    return m(request)
  File "/data/home/user00/python2.7/lib/python2.7/site-packages/scrapyd/webservice.py", line 57, in render_POST
    s.transport.signalProcess(signal)
  File "/data/home/user00/python2.7/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 339, in signalProcess
    raise ProcessExitedAlready()
ProcessExitedAlready

newhandLiu avatar Mar 17 '17 04:03 newhandLiu

I have the same problem, and still dont know how to solve it

m358807551 avatar Jan 10 '18 02:01 m358807551

spider.driver.quit() close the self.driver instance or closes all the chrome driver instance. For example if spider has two driver instance like self.driver and self.browser then spider.driver.quit() closes all the driver or closed just self.driver?

ashshakya avatar Sep 01 '19 07:09 ashshakya

you may not stop virtual display. try stop it!

runique avatar Apr 15 '21 07:04 runique

Having the same issue and found that @runique had the right idea! When I cancel a job, the virtual display never gets closed.

jpurquico avatar Jan 17 '23 19:01 jpurquico