facebook-scraper
facebook-scraper copied to clipboard
Date and time in comments incorrect ('comment_time')
Hey, I'm getting incorrect comment_time values when extracting comments from a single post. Probably it's due to facebook date formatting like "1m." - 1 month ago "2w" 2 weeks ago, and hour:minute:sec is extracted from the time of extracting data. It's the same issue as #624 but can't reporoduce an answer https://github.com/kevinzg/facebook-scraper/issues/624#issuecomment-1006921693
Code below:
set_noscript(True)
set_cookies("cookies.txt")
post = next(get_posts(post_urls=[4876511549099926], options={"comments": "generator"}))
for comment in post["comments_full"]:
print(comment["comment_time"])
Logs: Time of data extraction 2022-05-18 11:52:38
None
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-03-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
Logging:
Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments
I see - try https://github.com/kevinzg/facebook-scraper/commit/6acbb2f5a83d2636da01075732019d688ede6a88
Doesn't seem to solve the problem
set_noscript(True)
set_cookies("cookies_PL.txt")
post = next(get_posts(post_urls=[4876511549099926], options={"comments": "generator"}))
for comment in post["comments_full"]:
print(comment["comment_time"])
None
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-03-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
We can only work with what Facebook gives us. If all they say is "5 months ago", that's all the level of time accuracy the scraper can extract
I used the same code as you did (https://github.com/kevinzg/facebook-scraper/issues/624#issuecomment-1006921693) and you got correct datetimes, and I did not.
At the time I ran that, the comments weren't very old so showed more time information. Try:
set_noscript(True)
post = next(get_posts(post_urls=[5284284578322619], options={"comments": "generator"}))
for comment in post["comments_full"]:
print(comment["comment_time"])
For me, this outputs:
2022-05-16 14:18:00
2022-05-19 11:18:00
2022-05-14 05:40:00
2022-05-14 07:47:00
2022-05-14 06:16:00
2022-05-18 00:00:00
2022-05-17 00:00:00
2022-05-14 05:42:00
2022-05-15 03:02:00
2022-05-16 08:05:00
2022-05-14 09:17:00
2022-05-14 06:11:00
2022-05-19 22:00:00
2022-05-15 18:25:00
2022-05-14 05:49:00
2022-05-14 17:02:00
2022-05-14 07:45:00
2022-05-14 15:49:00
2022-05-14 06:01:00
2022-05-15 19:41:00
2022-05-14 17:32:00