amazoncaptcha More documentation for how to use with requests/lxml

Selenium is awesome, but I am trying to use this with requests and lxml. It seems like it is solving things properly, but I am having trouble submitting the solution. Could you add some example usage to the readme?

This is what I am doing right now using requests/lxml:

import random
import requests
from lxml import html
from fake_useragent import UserAgent
import csv
import time
import os
from amazoncaptcha import AmazonCaptcha


amazon_captcha_xpath = '//h4[contains(text(), "Enter the characters you see below")]'
captcha_image_xpath = '//div[@class="a-row a-text-center"]/img/@src'


def get_link(url, session=None, user_agent=None, proxy=None):
    """
    Fetches the HTML content from the provided URL.
    Returns a parsed lxml HTML tree that can be used with XPath.
    """
    ua = UserAgent()
    headers = {'User-Agent': ua.google if not user_agent else user_agent}
    proxies = {'http': proxy, 'https': proxy} if proxy else {}

    if session is None:
        session = requests.Session()

    response = session.get(url, headers=headers, proxies=proxies)
    tree = html.fromstring(response.content)

    return tree, session


# code that does stuff assuming there is no captcha. Leaving it out because it's long and probably not helpful.

if tree.xpath(amazon_captcha_xpath):
    bot_check = True
    print(html.tostring(tree).decode())
    print('[ Captcha Detected! ]')

    captcha_image_link = tree.xpath(captcha_image_xpath)[0]
    print(captcha_image_link)

    solution = AmazonCaptcha.fromlink(captcha_image_link).solve()
    print(f'Solution is: {solution}')

    print('Pausing to seem human...')
    time.sleep(random.randrange(3, 15))

 
    print('Submitting solution')
    
    # THIS IS THE PART TO SUBIMT IT THAT DOES NOT SEEM TO WORK
    
    amzn = tree.xpath('//input[@name="amzn"]/@value')[0]
    amzn_r = tree.xpath('//input[@name="amzn-r"]/@value')[0]

    data = {
        'amzn': amzn,
        'amzn-r': amzn_r,
        'field-keywords': solution
    }

    response = response = session.post('https://www.amazon.com/errors/validateCaptcha', data=data)

    # check response
    print(response.status_code)   # always comes back as 503
    #print(response.text)
    #input('PAUSED')
    ```

Jul 31 '23 07:07 lukeprofits

It is not in python but I will share my nodejs implementation of how to resolve amazon captcha:

 const amzn = $("form input[type=hidden]").val();
 let amazonPass: string;
    if (options.baseUrl.includes('?')) {
        const [base, query] = options.baseUrl.split('?');

        amazonPass = `${base}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}&${query}`
    } else {
        amazonPass = `${options.baseUrl}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}`
    }

    const response =  await gotScraping({
        url: amazonPass,
        cookieJar: options.cookieJar,
        followRedirect: true,
        headers: {
            "referer": options.baseUrl
        },
        // @ts-ignore
        proxyUrl: options.proxyUrl,
        sessionToken: options.sessionToken,
        throwHttpErrors: false,

    })

gotScraping is a request like a library. The thing it is a get request, requires referer and and a followup URL to redirect after the captcha resolves.

Aug 01 '23 07:08 3ldar

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 16 '23 23:09 stale[bot]

amazoncaptcha amazoncaptcha copied to clipboard

More documentation for how to use with requests/lxml

amazoncaptcha
amazoncaptcha copied to clipboard