puppeteer-extra
puppeteer-extra copied to clipboard
[Bug] hCaptcha detect fails due to new hCaptcha URL format
The plugin works great with reCAPTCHA, however it throws an error on pages with hCaptcha
After some investigations, I came to the following problem:
There is a block of code inside the plugin, where hCaptcha parameters are extracted:
_extractInfoFromIframes(iframes) {
return iframes
.map(el => el.src.replace('.html#', '.html?'))
.map(url => {
const { searchParams } = new URL(url);
const result = {
_vendor: 'hcaptcha',
url: document.location.href,
id: searchParams.get('id'),
sitekey: searchParams.get('sitekey'),
display: {
size: searchParams.get('size') || 'normal'
}
};
return result;
});
}
The hCaptcha iframe URL has the following format:
https://newassets.hcaptcha.com/captcha/v1/c44fc00/static/hcaptcha.html?_v=h8ew9h1l07#frame=challenge&id=0t7tnh8gx2un&host=mysite.com&sentry=undefined&reportapi=https%3A%2F%2Faccounts.hcaptcha.com&recaptchacompat=true&custom=false&tplinks=on&pstissuer=https%3A%2F%2Fpst-issuer.hcaptcha.com&sitekey=cf0b9a27-82e3-42fb-bfec-562f8045e495&size=invisible&theme=light&origin=https%3A%2F%2Fmysite.com
Since there is no substring .html#
(html is followed by ?_v=…) the URL stays unmodified, and parameters like id, sitekey and size can't be extracted from the query string.
As a result, in logs I get message:
PuppeteerExtraPluginRecaptcha: An error occured during "getRecaptchaSolutions": {
_vendor: 'hcaptcha',
provider: '2captcha',
error: 'Error: Missing data in captcha'
}
I think, the quick workaround colud be something like:
if there is .html?
in iframe URL, just replace '#' with '&', which will make _v
a common GET-parameter, otherwise replace .html#
with .html?