Immoscout crawler doesn't work
Hi there,
I won't get the immoscout bot to work. WG-Gesucht works fine.
Here's the error depending if I use python3 or 2. I assume the crawler is broken as the href file never get's any data. Any ideas how to fix it?
Janns-MBP:immobot jann$ python immo.py
Traceback (most recent call last):
File "immo.py", line 8, in
Janns-MBP:immobot jann$ python3 immo.py
There was a problem with reading a json formatted object
Traceback (most recent call last):
File "immo.py", line 17, in
Time: 2020-08-17 22:43:03.966169
^CTraceback (most recent call last):
File "immo.py", line 74, in
With python2, one should
from simplejson import JSONDecodeError
Let's focus on python3.
Could you paste the content of the file href.json here?
Hi @nickirk I got the same error.
The content of href.json is empty:
cat href.json | wc -l
0
I tried checking if the url you provide as an example worked (maybe mine was broken), but it fails the same way.
Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here.
Maybe change the user agent when doing the request?
On Thu 20. Aug 2020 at 21:07, Ke [email protected] wrote:
Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nickirk/immo/issues/10#issuecomment-677844577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDGWAESYCNYOHRIDXUYZTSBVYATANCNFSM4QCGESQA .
-- Rodrigo Oliveira
I tried simply replace the user agent to a value I found online and it didn't work
@nickirk Have you found a working solution for this issue yet? Thanks.
@krassle @nickirk , Did you find a solution? Thanks.
Sorry guys, I have been busy with my thesis and have no time to look into this issue. I encourage you to follow the discussions here and try something by yourselves. I personally recommend using the script on wg-gesucht.de (there is also a minor issue regarding applying the filters on wg-gesucht.de, but other than that, the script should work).
anyone found a solution for this issue? guess if not, then this entire bot is useless and waste of time for someone who is not a programmer.
Tried to fix this issue by proxy rotation. Still not working
Hi, I have the same issue. Instead of submitting an offer, I extended your scripts and added functionality to send myself a message on telegram so I could manually check if the apartment is ok.
In the case of your scripts you just need to change submit.py to the following -> instead of submitting send telegram message.
import requests
def submit_app(bot_message):
bot_token = '<bot token>'
bot_chatID = '<chat ID>'
link = 'https://www.immobilienscout24.de' + \
bot_message + '%23/basicContact/email'
send_text = 'https://api.telegram.org/bot' + bot_token + \
'/sendMessage?chat_id=' + bot_chatID + \
'&parse_mode=Markdown&text=' + link
response = requests.get(send_text)
return response.json()
Unfortunately crawler doesn't work anymore :/
Immoscout uses some kind of bot protection and redirect to ReCaptcha :/ I guess that's the end of the automatic apartment finding :p
Is this still not working? Thought about giving it a try but reading this comments it doesn't look too promising
Nope, unfortunately now there is no way to go around it. They heavily protect themselves against webscraping
Well that's unfortunate, thanks for the quick reply tho
They are using a certain service for recaptcha against bots, all the used puzzles can be solved with a certain propability programmatically with a lot of effort. The question imo are if the anti captcha logic can be good enough and if there is somebody who wants to invest that time.