scylla
scylla copied to clipboard
no proxy crawled
Please provide the following information if applicable:
-
Operating system and its version: Mac 10.10.5, python3.6.6
-
Version number of Scylla: 1.1.5
no proxy is crawled
2019-01-15 - 13:05:59 DEBUG: create new db connection
2019-01-15 - 13:05:59 INFO: Scheduler starts...
2019-01-15 - 13:05:59 DEBUG: feed 8 providers...
2019-01-15 - 13:05:59 INFO: Start python scheduler
2019-01-15 - 13:05:59 INFO: worker_process started
2019-01-15 - 13:05:59 INFO: validator_thread started
2019-01-15 - 13:05:59 DEBUG: fetch_ips...
2019-01-15 - 13:05:59 INFO: Start the web server
[2019-01-15 13:05:59 +0800] [98963] [INFO] Goin' Fast @ http://0.0.0.0:8899
2019-01-15 - 13:05:59 DEBUG: Get a provider from the provider queue: A2uProvider
[2019-01-15 13:05:59 +0800] [98963] [INFO] Starting worker [98963]
2019-01-15 - 13:05:59 INFO: Start forward proxy server on port 8081
2019-01-15 - 13:06:59 DEBUG: Feed 0 proxies from the database for a second time validation
Is your server located in mainland China?
yes, in mainland China. Even if I set a vpn on my router, there is no proxy crawled.
Could you please you a oversea server?
ThKs, I'll try.
if i want to using it in mainland China, just write numbers of provider in scylla/providers
and rebuild? that's right?
but, when I provided one like the following
class CNProxyComProvider(BaseProvider):
def urls(self) -> [str]:
return [
'https://cn-proxy.com/',
'https://cn-proxy.com/archives/218'
]
def parse(self, html: HTML) -> [ProxyIP]:
ip_list: [ProxyIP] = []
for ip_row in html.find('table tbody tr'):
ip_element = ip_row.find('td:nth-child(1)', first=True)
port_element = ip_row.find('td:nth-child(2)', first=True)
try:
if ip_element and port_element:
ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', ip_element.text).group(0)
port = re.search('\d{2,5}', port_element.text).group(0)
p = ProxyIP(ip=ip, port=port)
ip_list.append(p)
except AttributeError:
pass
return ip_list
@staticmethod
def should_render_js() -> bool:
return False
and add it to __init__.py
from .cn_proxy_com_provider import CNProxyComProvider
all_providers = [
CNProxyComProvider,
A2uProvider,
....
]
It does not crawl any proxy, although I can parse some proxys with request_htmls
in command line.
log of scylla:
2019-01-18 - 18:46:54 DEBUG: create new db connection
2019-01-18 - 18:46:55 INFO: Scheduler starts...
2019-01-18 - 18:46:55 DEBUG: feed 9 providers...
2019-01-18 - 18:46:55 INFO: Start python scheduler
2019-01-18 - 18:46:55 INFO: worker_process started
2019-01-18 - 18:46:55 INFO: validator_thread started
2019-01-18 - 18:46:55 DEBUG: fetch_ips...
2019-01-18 - 18:46:55 DEBUG: Get a provider from the provider queue: CNProxyComProvider
2019-01-18 - 18:46:55 INFO: Start the web server
[2019-01-18 18:46:55 +0800] [97416] [INFO] Goin' Fast @ http://0.0.0.0:8899
2019-01-18 - 18:46:55 INFO: Start forward proxy server on port 8081
[2019-01-18 18:46:55 +0800] [97416] [INFO] Starting worker [97416]
2019-01-18 - 18:47:55 DEBUG: Feed 0 proxies from the database for a second time validation
I have the same problem in windows server 2016