quora-backup
quora-backup copied to clipboard
Don't insist on answer component of URL
crawler.py can be used to retrieve blogs from Quora, not just answers. But if it is, the constraint that the URL fetched needs to match quora.com/answer/... needs to be relaxed:
# Get the part of the URL indicating the question title; we will save under this name
m1 = re.search('quora\.com/([^/]+)/answer', url)
# if there's a context topic
m2 = re.search('quora\.com/[^/]+/([^/]+)/answer', url)
filename = added_time + ' '
if not m1 is None:
filename += m1.group(1)
elif not m2 is None:
filename += m2.group(1)
else:
print('[ERROR] Could not find question part of URL %s; skipping' % url, file=sys.stderr)
continue
I change the last two lines to:
# blog post
m3 = re.search('quora\.com/([^/]+)', url)
filename += m3.group(1)