Unofficial-Quora-API icon indicating copy to clipboard operation
Unofficial-Quora-API copied to clipboard

How did you crawl quora ?

Open msdeep14 opened this issue 7 years ago • 7 comments

As of robots.txt of quora, quora doesn't allow to crawl its website, https://www.quora.com/robots.txt I tried, sometimes it returns expected result, othertimes None.

How did you do it? Did you sent email to [email protected]

msdeep14 avatar Sep 16 '17 05:09 msdeep14

Actually, I'm not allowed to crawl Quora which is why this project shouldn't really be used.

I used a mix of selenium and bs4 for scraping Quora.

kalbhor avatar Sep 16 '17 05:09 kalbhor

I got a idea, we can crawl if we login to website using our own login, I'm trying to do that, but not successfull, always being redirected to login page, whenever I try to access login page

    data = {'email': email,'password':getpass()}
    s.post('https://www.quora.com/{}/following'.format(username), data=data)
    r = s.get('https://www.quora.com/{}/following'.format(username))
    soup = BeautifulSoup(r.content)
    print soup
    div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
    print div 

In my script, I want to get the person I'm following and answers written by him

Do you have any ideas why I'm failing?

msdeep14 avatar Sep 16 '17 05:09 msdeep14

What is the response code you get on line 2? s.post('https://www.quora.com/{}/following'.format(username), data=data)

kalbhor avatar Sep 16 '17 06:09 kalbhor

<Response [200]>

msdeep14 avatar Sep 16 '17 06:09 msdeep14

Will this work for users that have logged in using facebook?

kalbhor avatar Sep 16 '17 06:09 kalbhor

Quora doesn't really have an API. I doubt they'll provide data in this manner. I used selenium to get past this problem.

kalbhor avatar Sep 16 '17 06:09 kalbhor

Till now what I want in my project is that, a person can get recent answers posted by people he is following, and I'm considering that he will login using email and password(will update code later for fb and google login)

Till now I wrote this

    email = str(raw_input("email: "))
    s = requests.Session()
    data = {'email': email,'password':getpass()}
    res = s.post('https://www.quora.com/{}/following'.format(username), data=data)
    print res
    r = s.get('https://www.quora.com/{}/following'.format(username))
    soup = BeautifulSoup(r.content)
    print soup
    div = soup.find('div',{'class':'UserConnectionsFollowingList PagedList'})
    print div
    user_list = []
    if div is not None:
    	for user in div.find_all('a'):
    		user_list.append("https://www.quora.com{}".format(url.get('href')))
    print user_list

If I get user_list, then I will fetch latest answers written by them.

msdeep14 avatar Sep 16 '17 06:09 msdeep14