engine
engine copied to clipboard
Imperva Incapsula prevent bot detection
Trying to add Just Eat service with the following declaration
{
"name": "Just Eat",
"documents": {
"Terms of Service": {
"fetch": "https://www.just-eat.ie/info/terms-and-conditions",
"select": {
"startBefore": "#just-eat-website-terms-and-conditions",
"endBefore": "#ii.just-eat-voucher-terms-conditions"
}
},
"Privacy Policy": {
"fetch": "https://www.just-eat.ie/info/privacy-policy",
"select": [".main-text"]
},
"Trackers Policy": {
"fetch": "https://www.just-eat.ie/info/cookies-policy",
"select": [".main-text"]
}
}
}
I get this error message
Content inacessible: Error: The document cannot be accessed or its content can not be selected: The provided selector ".main-text" has no match in the web page https://www.just-eat.ie/info/cookies-policy.
The saved snapshot contains incorrect data:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>
Some research leads me to believe that it is the following service https://www.imperva.com/products/advanced-bot-protection-management/ which seems to be well explained here https://www.imperva.com/blog/how-incapsula-client-classification-challenges-bots/
I did 3 things on this matter
- [X] tag them on twitter as it seems that their service is really down (from my computer/phone, on wifi/4G, through Chrome/Brave/Tor) https://twitter.com/OpenTerms/status/1430498025630158849
- [X] contacted Imperva on https://www.imperva.com/contact-us/ but their contact form does not seem to work :-(
- [X] tagged @Imperva on twitter https://twitter.com/OpenTerms/status/1430502093870157826
Here is the content of my communication to them
Hi,
My name is Martin Ratinaud, CTO at the French Embassy for Digital Affairs.
We are running the OpenSource project "Open Terms Archive" which aims at tracking
ToS for every service in the world, in all languages and all countries.
As such we are implementing a crawler that tracks changes on ToS regularly.
Could we get in touch so that we become a known and trusted bot.
Thanks a lot
Check our websites here:
https://www.opentermsarchive.org/en
https://disinfo.quaidorsay.fr/en
Had a chat with Imperva and finally send an email on [email protected]
Hi,
My name is Martin Ratinaud, CTO at the French Embassy for Digital Affairs and Henri Verdier in CC is the ambassador.
We are running the OpenSource project "Open Terms Archive" which aims at tracking ToS for every service in the world, in all languages and all countries.
As such we are implementing a crawler that tracks changes on ToS regularly.
We know we are currently blocked by your services and would like our bot to be trusted by Imperva as a good bot (whitelisted) so that we are not blocked anymore
Thanks a lot
Check our websites here:
https://www.opentermsarchive.org/en
https://disinfo.quaidorsay.fr/en
We do not actively work on #166 at the moment. We will reopen it when we prioritise this work again. In the meantime, feel free to add any additional relevant information specific to Imperva Incapsula to this issue.