scrapydweb
scrapydweb copied to clipboard
Scrapyd running on a remote machine causes UI links to be broken.
Describe the bug If the ScrapyD Server(s) are running on a remote host (on same VPN) and the ScrapyDWeb is running on a seperate node then the links to Logs and Items become broken by design.
Example:
ScrapyD Server is ruinning at 189.09.09.90:6800
on a AWS EC2 01 and the ScrapyDWeb is running at 89.09.09.80:80
then in the config I will provide 189.09.09.90:6800
as the scrapyd server location and this will cause the links to be rendered as 189.09.09.90:6800/logs/proj/job/2019-09-30T11_01_23.log
which becomes inaccessible for the browser however 189.09.09.90:6800
can be exposed via reverse proxy to abc.domain.com
and then abc.domain.com/logs/proj/job/2019-09-30T11_01_23.log
will be accessible.
Possible solution would be allow an alias for each server which can be used for generating the links.
What if adding abc.domain.com:80
as the Scrapyd server?
@my8100 sorry for the late reply
adding abc.domain.com:80
but each request now goes through web while it could have reached locally.
let me explain my setup
- We have a cluster of scrapyd.
- We have scrapydweb instance connected to all the servers locally.
- Everything is running locally and for security, only scrapydweb is exposed to the internet.
Ideally scrapydweb should fetch and display all the urls from scrapyd regardless of the type of data.
so instead of directly opening scrapyd-server-001.local:6800/logs/proj/job/2019-09-30T11_01_23.log
it must open scrapydweb.domain.com/scrapyd-server-001.local/logs/proj/job/2019-09-30T11_01_23.log
basically it would internally redirect the requests to the scrapyd servers.
https://imgur.com/a/2i7Jf37
OK, it may be supported in a future release.
But for now, you can view the first 100 and last 100 lines of the log in the Stats page.
Hi, I would like to promote this feature as well. It will help to redirect links in the cloud-based setups as well.
@Rohithzr @sergiigladchuk The requested feature is supported in PR #128 Please give it a try and offer your feedback, thanks.
- Stop Scrapydweb.
- Execute
pip install --upgrade git+https://github.com/my8100/scrapydweb.git
to get the latest code. - Add the content below in the existed file scrapydweb_settings_v10.py. https://github.com/my8100/scrapydweb/blob/12c48923dd64bbef3c8db2bd40c93d1854ebd46f/scrapydweb/default_settings.py#L106-L115
- Update the option
SCRAPYD_SERVERS_PUBLIC_URLS
accordingly. - Restart Scrapydweb.
For anyone trying to make this work with nginx, a "subfolder" config (mydomain.com/scrapy
) didn't work for me for some reason.
I had success with a subdomain config like this (scrapy.mydomain.com
):
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name scrapy.*;
include /config/nginx/ssl.conf;
client_max_body_size 0;
location / {
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app scrapydweb;
set $upstream_port 5000;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}