douban-to-imdb icon indicating copy to clipboard operation
douban-to-imdb copied to clipboard

豆瓣限制一個 IP 一次至多抓10頁

Open gasolin opened this issue 2 years ago • 1 comments

今天試發現 豆瓣限制一個 IP 短時間內一次最多抓10頁,

稍微改了一下加入 pagination = 1 參數如下

def export(user_id):
    urls = url_generator(user_id)
    info = []
    pagination = 1
    page_no = pagination
    for idx, url in enumerate(urls, start=1):
        if idx < pagination:
            continue
        if IS_OVER:#or page_no == pagination + 5
            break
        print(f'开始处理第 {page_no} 页...')
...

調整 pagination 值, 搭配不同 VPN server 可以全抓下來

gasolin avatar Dec 25 '21 08:12 gasolin

My workaround is to automtically change IP whenever blocked. I'm using windscribe VPN, following the instructions in the first part this article. Once manually logged in through terminal, I'm able to change IP by just one Windscribe command without any input prompted, which makes it easy to integrate into Python code.

Some pseudo-code snippets:

os.system("windscribe connect US")
try:
    get_info(url)
except TypeError:  # get_info(url) returns None when reached maximum request limit from same IP
    os.system("windscribe connect US")
    get_info(url)

niauah avatar Apr 24 '22 13:04 niauah