facebook-scraper
facebook-scraper copied to clipboard
Limit scrapes before certain date
Hello,
It's my first time using your scraper, and first I would like to thank you so much for creating this, it is super helpful! I was going through the syntax and was wondering how I can scrape posts between Jan 2020 - current. It seems that the scraper is scraping from the recent towards the past. Is this syntax correct?
df = []
for post in get_posts(account='mgchoi86',credentials=(),
options={"allow_extra_requests": False, "comments":False, "reactors":False,
"progress":True, "posts_per_page": 200}):
df.append(post)
if post['time'] < datetime.datetime(2020,1,1):
break
Well, I got an error '<' not supported between instances of 'NoneType' and 'datetime.datetime'
so I don't think this is the right approach
That syntax looks mostly fine to me, but I think you probably intended to have the if post['time'] < datetime.datetime(2020,1,1):
statement indented within the for
loop.
I'm unable to reproduce this bug, your code works fine for me.
for post in get_posts(account='mgchoi86',
options={"allow_extra_requests": False, "comments":False, "reactors":False,
"progress":True, "posts_per_page": 200}):
print(post["time"])
if post['time'] < datetime.datetime(2020,1,1):
break
outputs:
2022-06-26 18:35:00
2022-03-07 17:10:00
2022-03-01 11:22:00
2022-03-01 10:25:00
2022-02-02 11:57:00
2022-01-29 20:55:00
2021-08-29 23:33:00
2021-08-18 20:56:00
2021-07-12 15:37:00
2021-04-26 20:59:00
2021-03-26 20:59:00
2021-02-03 23:46:00
2020-12-31 14:16:00
2020-12-27 13:33:00
2020-12-19 16:44:00
2020-12-16 11:11:00
2020-12-02 11:20:00
2020-11-17 06:50:00
2020-11-10 09:28:00
2020-11-05 17:59:00
2020-11-05 10:45:00
2020-09-30 10:57:00
2020-07-29 21:22:00
2020-07-25 12:15:00
2020-06-19 22:58:00
2020-06-19 19:04:00
2020-06-15 10:32:00
2020-06-14 17:44:00
2020-06-11 10:33:00
2020-06-04 19:53:00
2020-05-29 11:28:00
2020-03-16 09:16:00
2020-03-15 17:06:00
2020-03-02 15:15:00
2020-03-01 12:46:00
2020-02-24 14:14:00
2020-02-11 15:56:00
2019-10-15 00:59:00
Perhaps check if post["time"]
is not None before trying to compare it to a datetime, like so:
if post["time"] is not None and post['time'] < datetime.datetime(2020,1,1):