SnapchatBot
SnapchatBot copied to clipboard
cssselect fails on Googlerbot
Hi, cool library - really fun stuff.
I'm trying to get the Googlerbot to work, but having trouble with:
href = root.cssselect(".bia")[0].attrib["href"]
The root
object does not have any DOM nodes with class = bia
. Does this have to do with google's HTML being asynchronous? When I write the root
object to file there is some HTML from the response, but not all of it.
Any idea how I can fix this problem?
I've also tried looking into this,
I've tried fetching the page with urlopen
and with requests
with and without redirects, and can't seem to get html that looks like what you would get from a browser.
Perhaps fiddling with the User-Agent may fix the problem?
User-agent doesn't make a difference for me. I'm fairly sure google results are lazily loaded so parsing the HTML synchronously won't work. I ended up using the (deprecated) google image search API - it just returns JSON so it's much easier to work with.
Could you close the issue and/or make a commit with your solution ? Thanks :smile:
Sorry, I forgot about this. I do have a solution that works and will pull request as soon as I can get around to it.
Awesome! Will this be invite only or open to public github? Any idea on a time frame?
It will be by the end of the week! I have my own public project using this library. It is a working extension of the google-bot - feel free to check it out. I'll clean it up, get rid of some of the extra stuff, and send a pull request to this project later this week.