Wooey icon indicating copy to clipboard operation
Wooey copied to clipboard

Wooey and while True

Open toert opened this issue 7 years ago • 15 comments

If I start an executing a script with "while True" and after some time I decide to stop it via Stop button then it will not stop and will work infinitely. My solution of the problem is killing all Celery workers as processes. Do you have any ideas how can I stop infinitely scripts via web interface, not CLI?

toert avatar Dec 14 '17 10:12 toert

Does your script have a blind try/except clause? If so, it may be swallowing the exception celery sends to stop a task.

Chris7 avatar Dec 19 '17 21:12 Chris7

It actually doesn't have any blind try/except clauses

toert avatar Jan 01 '18 11:01 toert

Can you provide the script? I just tested and celery successfully terminated:

[2018-01-01 23:24:57,945: INFO/MainProcess] Terminating 90cbb3e6-7026-45c9-b800-8e2dad93f2f1 (Signals.SIGKILL)
[2018-01-01 23:24:57,976: ERROR/MainProcess] Task wooey.tasks.submit_script[90cbb3e6-7026-45c9-b800-8e2dad93f2f1] raised unexpected: Terminated(9,)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/billiard/pool.py", line 1678, in _set_terminated
    raise Terminated(-(signum or 0))
billiard.exceptions.Terminated: 9

This is the script I tested with:

import argparse
import sys

parser = argparse.ArgumentParser(description="run forever!")

def main():
    while True:
        import time
        time.sleep(5)

if __name__ == "__main__":
    parser.parse_args()
    sys.exit(main())

Chris7 avatar Jan 01 '18 23:01 Chris7

import sys
import time
import argparse
import logging

import requests  # pip install requests


logging.basicConfig(level=logging.INFO)

parser = argparse.ArgumentParser()
parser.add_argument('--m', type=int)

args = parser.parse_args()


def main():
    print('Hello')
    logging.info('start')

    while True:
        print('Hello True')
        logging.info('i am still running')
        requests.get('http://127.0.0.1:8000')
        time.sleep(10)


if __name__ == "__main__":
    sys.exit(main())

And it sends requests even after clicking Stop button.

How I run celery:

#!/bin/sh
cd /opt
source venv/bin/activate
cd DO_wooey
python manage.py celery worker -c 5 --beat -l info

toert avatar Jan 02 '18 11:01 toert

Sorry it's taken me a bit to get to this. I just tried your script and the stop button worked.

celery_1  | [2018-01-22 15:29:53,680: INFO/MainProcess] Received task: wooey.tasks.submit_script[df427903-e036-44b9-84bd-192c4a670ed0]
celery_1  | [2018-01-22 15:30:07,053: INFO/MainProcess] Terminating df427903-e036-44b9-84bd-192c4a670ed0 (Signals.SIGKILL)
celery_1  | [2018-01-22 15:30:07,086: ERROR/MainProcess] Task wooey.tasks.submit_script[df427903-e036-44b9-84bd-192c4a670ed0] raised unexpected: Terminated(9,)
celery_1  | Traceback (most recent call last):
celery_1  |   File "/usr/local/lib/python3.6/site-packages/billiard/pool.py", line 1678, in _set_terminated
celery_1  |     raise Terminated(-(signum or 0))
celery_1  | billiard.exceptions.Terminated: 9

I would look at your installed dependencies.

Chris7 avatar Jan 22 '18 15:01 Chris7

What version of python are you using and can you provide the output of pip freeze?

Chris7 avatar Jan 22 '18 15:01 Chris7

Dependencies https://github.com/toert/currencies_bot/blob/master/requirements.txt

toert avatar Jan 27 '18 15:01 toert

I think I know the reason -- the default broker is SQL, which is useful for development/testing. However, the broadcast/control commands are not supported by this broker. When you run python manage.py celery inspect active, do you receive:Error: Broadcast not supported by SQL broker transport?

To fix this, you need to define a "real" broker like rabbit in your user_settings (BROKER_URL).

Chris7 avatar Feb 18 '18 16:02 Chris7

No, I use real broker. (venv) toerting@ubuntu-1gb-lon1-01:/opt/DO_wooey$ python manage.py celery inspect active -> celery@ubuntu-1gb-lon1-01: OK - empty -

toert avatar Feb 26 '18 16:02 toert

Ok, to debug this I'll need a step by step to reproduce on my end from a clean setup.

Chris7 avatar Feb 26 '18 18:02 Chris7

I inspected Wooey's source code and found that jobs' processes are killed by SIGKILL -9. There is no way to block or to try to catch the signal. Also after sending SIGKILL scripts stop immediately and then rerun. The reason of it is RabbitMQ Queued messages. Wooey's stop job button do nothing with queued messages. However, if a process was stopped by internal conditions(exceptions, exitcode 0, etc) and a job status became 'Completed' then it disappears from RabbitMQ queue. Implementing 'Completed' status via Django admin doesn't delete message. Is that any way to purge queue after clicking stop job button?

toert avatar May 16 '18 07:05 toert

This is tricky. The stop behavior in celery is this:

When a worker receives a revoke request it will skip executing the task, but it won’t terminate an already executing task unless the terminate option is set. If terminate is set the worker child process processing the task will be terminated. The default signal sent is TERM, but you can specify this using the signal argument. Signal can be the uppercase name of any signal defined in the signal module in the Python Standard Library.

There seems to be no good way to stop a stuck process that doesn't either nuke the entire worker or risk not actually working (a process can ignore SIGHUPs). I think a better solution would be to have a STOPPING state after a SIGHUP is sent, and then if a task is in STOPPING, have the Stop button change to Kill which will terminate the task/process.

Also after sending SIGKILL scripts stop immediately and then rerun.

I think the reason it is rerunning is because you have ACKS_LATE set to True. This means that a task is only taken off the queue after it is successful. One option is to disable ACKS_LATE and use the rerun command instead to selectively requeue work.

Is that any way to purge queue after clicking stop job button? You can purge messages through celery (look at celery purge) or through rabbitmq's management page.

Chris7 avatar May 16 '18 12:05 Chris7

@toert I take it this means you are able stop scripts now?

Chris7 avatar May 20 '18 12:05 Chris7

Actually not. Take a look at https://github.com/toert/DO_wooey . As you can see ACKS_LATE isn't defined by me, also a default value is False. And celery purge looks good, however I don't want to purge it manually every time 😀

toert avatar May 28 '18 07:05 toert

What OS are you using? I setup a Wooey server using that repository in python 3.6.5 and halting scripts worked as expected.

Also, you might want to upgrade the version of Wooey you are using to at least the latest in 0.9.x (if not 0.10.x, though 0.10.x has a few changes wrt celery that will require updating some of your settings)

Chris7 avatar Jun 23 '18 14:06 Chris7