GerapyPyppeteer icon indicating copy to clipboard operation
GerapyPyppeteer copied to clipboard

【疑似BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败

Open DeSireFire opened this issue 4 years ago • 0 comments

部署新服务器的时候出现了问题。经过对比定位到了原因。 GerapyPyppeteer/gerapy_pyppeteer/pretend.py 使用 0.0.13版本正常代码如下 SET_WEBDRIVER = '''() => {Object.defineProperty(navigator, 'webdriver', {get: () => undefined})}''' 使用 0.1.2 其中第73行的SET_WEBDRIVER变量存在问题.请求某数时,被检测返回400.

测试代码:

import json
import os
import asyncio
import time

from pyppeteer import launch, connection
from pyppeteer import chromium_downloader
from gerapy_pyppeteer.pretend import SCRIPTS as PRETEND_SCRIPTS
from pyppeteer.network_manager import Response



async def main():
    browser = await launch({'headless': False, 'timeout': 10000, 'args': ['--no-sandbox', ]},)
    page = await browser.newPage()
    for script in PRETEND_SCRIPTS:
        await page.evaluateOnNewDocument(script)

    print(len(await browser.pages()))
    await page.goto(http://www.某个网址.com.cn/old_house/old_house.html') # 记得修改

    await page.waitForNavigation()


    await page.waitFor(10 * 1000)

    print(await page.evaluate("document.cookie"))
    print(f'等待url 完成')

    # await page.waitFor(10 * 1000)
    print(await page.content())

    await browser.close()



asyncio.get_event_loop().run_until_complete(main())

会拿到一个空白页

DeSireFire avatar Jul 22 '21 10:07 DeSireFire