scrapydweb icon indicating copy to clipboard operation
scrapydweb copied to clipboard

Timer tasks not working with auth on

Open Tobeyforce opened this issue 3 years ago • 3 comments

When having auth enabled, my timer tasks stop working. The response visible in result is: image

So Scrapyd is trying to send a request to Scrapydweb, but with auth it expects the basic auth, which Scrapyd does not add to the header. Is there any way to fix this? It's worth mentioning I have deployed Scrapydweb with gunicorn&nginx.

Any advice would be helpful.

Tobeyforce avatar Apr 08 '21 13:04 Tobeyforce

  1. Click the history button on the timer tasks page, then post the related log.
  2. Run scrapydweb without gunicorn&nginx and try again.

my8100 avatar Apr 08 '21 16:04 my8100

History log:

[2021-04-08 16:20:05,034] WARNING in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, would retry later: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}
[2021-04-08 16:20:08,039] ERROR in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, no more retries: Traceback (most recent call last):
  File "/var/www/html/scrapydweb/views/operations/execute_task.py", line 89, in schedule_task
    assert js['status_code'] == 200 and js['status'] == 'ok', "Request got %s" % js
AssertionError: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}

[2021-04-08 16:20:40,519] WARNING in apscheduler: Shutting down the scheduler for timer tasks gracefully, wait until all currently executing tasks are finished
[2021-04-08 16:20:40,521] WARNING in apscheduler: The main pid is 1267. Kill it manually if you don't want to wait

Unfortunately running Scrapyd with gunicorn&nginx has created all kinds of problems for me, I hope you one day add an official way to deploy scrapydweb so that we don't have to create workarounds :( Without a prod server I've never had issues, so I know it would work otherwise.

My understanding is that each request goes through a middleware in run.py

    @app.before_request
    def require_login():
        if app.config.get('ENABLE_AUTH', False):
            auth = request.authorization
            USERNAME = str(app.config.get('USERNAME', ''))  # May be 0 from config file
            PASSWORD = str(app.config.get('PASSWORD', ''))
            if not auth or not (auth.username == USERNAME and auth.password == PASSWORD):
                return authenticate()

My only workaround so far is to change this..

Tobeyforce avatar Apr 08 '21 16:04 Tobeyforce

Could you debug with the following steps first?

  1. Run scrapydweb without gunicorn&nginx and try again.
  2. Run scrapydweb with gunicorn and try again.
  3. Run scrapydweb with nginx and try again.

my8100 avatar Apr 09 '21 00:04 my8100