scrapydweb icon indicating copy to clipboard operation
scrapydweb copied to clipboard

Scrapyd running on a remote machine causes UI links to be broken.

Open Rohithzr opened this issue 5 years ago • 13 comments

Describe the bug If the ScrapyD Server(s) are running on a remote host (on same VPN) and the ScrapyDWeb is running on a seperate node then the links to Logs and Items become broken by design.

Example: ScrapyD Server is ruinning at 189.09.09.90:6800 on a AWS EC2 01 and the ScrapyDWeb is running at 89.09.09.80:80 then in the config I will provide 189.09.09.90:6800 as the scrapyd server location and this will cause the links to be rendered as 189.09.09.90:6800/logs/proj/job/2019-09-30T11_01_23.log which becomes inaccessible for the browser however 189.09.09.90:6800 can be exposed via reverse proxy to abc.domain.com and then abc.domain.com/logs/proj/job/2019-09-30T11_01_23.log will be accessible.

Possible solution would be allow an alias for each server which can be used for generating the links.

Rohithzr avatar Sep 30 '19 20:09 Rohithzr

What if adding abc.domain.com:80 as the Scrapyd server?

my8100 avatar Oct 02 '19 02:10 my8100

@my8100 sorry for the late reply

adding abc.domain.com:80 but each request now goes through web while it could have reached locally.

let me explain my setup

  1. We have a cluster of scrapyd.
  2. We have scrapydweb instance connected to all the servers locally.
  3. Everything is running locally and for security, only scrapydweb is exposed to the internet.

Ideally scrapydweb should fetch and display all the urls from scrapyd regardless of the type of data.

so instead of directly opening scrapyd-server-001.local:6800/logs/proj/job/2019-09-30T11_01_23.log it must open scrapydweb.domain.com/scrapyd-server-001.local/logs/proj/job/2019-09-30T11_01_23.log

basically it would internally redirect the requests to the scrapyd servers.

https://imgur.com/a/2i7Jf37

Rohithzr avatar Oct 09 '19 06:10 Rohithzr

OK, it may be supported in a future release.

But for now, you can view the first 100 and last 100 lines of the log in the Stats page.

my8100 avatar Oct 09 '19 12:10 my8100

Hi, I would like to promote this feature as well. It will help to redirect links in the cloud-based setups as well.

sergiigladchuk avatar Mar 10 '20 09:03 sergiigladchuk

@Rohithzr @sergiigladchuk The requested feature is supported in PR #128 Please give it a try and offer your feedback, thanks.

  1. Stop Scrapydweb.
  2. Execute pip install --upgrade git+https://github.com/my8100/scrapydweb.git to get the latest code.
  3. Add the content below in the existed file scrapydweb_settings_v10.py. https://github.com/my8100/scrapydweb/blob/12c48923dd64bbef3c8db2bd40c93d1854ebd46f/scrapydweb/default_settings.py#L106-L115
  4. Update the option SCRAPYD_SERVERS_PUBLIC_URLS accordingly.
  5. Restart Scrapydweb.

my8100 avatar Mar 15 '20 13:03 my8100

For anyone trying to make this work with nginx, a "subfolder" config (mydomain.com/scrapy) didn't work for me for some reason.

I had success with a subdomain config like this (scrapy.mydomain.com):

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    server_name scrapy.*;

    include /config/nginx/ssl.conf;

    client_max_body_size 0;

    location / {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app scrapydweb;
        set $upstream_port 5000;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }
}

ftruzzi avatar Sep 01 '23 23:09 ftruzzi