facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

Date and time in comments incorrect ('comment_time')

Open staszeks opened this issue 2 years ago • 5 comments

Hey, I'm getting incorrect comment_time values when extracting comments from a single post. Probably it's due to facebook date formatting like "1m." - 1 month ago "2w" 2 weeks ago, and hour:minute:sec is extracted from the time of extracting data. It's the same issue as #624 but can't reporoduce an answer https://github.com/kevinzg/facebook-scraper/issues/624#issuecomment-1006921693

Code below:

set_noscript(True)
set_cookies("cookies.txt")

post = next(get_posts(post_urls=[4876511549099926], options={"comments": "generator"}))
for comment in post["comments_full"]:
    print(comment["comment_time"])

Logs: Time of data extraction 2022-05-18 11:52:38

None
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2022-03-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2022-01-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38
2021-12-18 11:52:38

Logging:

Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Requesting page from: https://m.facebook.com/4876511549099926
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Fetching story.php?story_fbid=1775680969304791&id=1775680969304791&m_entstream_source=video_home&player_suborigin=entry_point&player_format=permalink
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
Got exact timestamp from publish_time: 2022-01-02 16:00:13
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_thumbnail didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_video_meta didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_factcheck didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_share_information didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_listing didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
[1775680969304791] Extract method extract_with didn't return anything
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('_55wr', '_10pt', '_3eqx', 'post_placeholder') id='post_placeholder-4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Unable to parse comment <Element 'div' class=('async_elem',) id='see_next_4876511549099926'>: 'NoneType' object has no attribute 'text'
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments
Fetching up to 1000000000.0 comments

staszeks avatar May 18 '22 10:05 staszeks

I see - try https://github.com/kevinzg/facebook-scraper/commit/6acbb2f5a83d2636da01075732019d688ede6a88

neon-ninja avatar May 18 '22 22:05 neon-ninja

Doesn't seem to solve the problem

set_noscript(True)
set_cookies("cookies_PL.txt")

post = next(get_posts(post_urls=[4876511549099926], options={"comments": "generator"}))
for comment in post["comments_full"]:
    print(comment["comment_time"])
None
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-03-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2022-01-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00
2021-12-19 00:00:00

staszeks avatar May 19 '22 08:05 staszeks

We can only work with what Facebook gives us. If all they say is "5 months ago", that's all the level of time accuracy the scraper can extract

neon-ninja avatar May 19 '22 09:05 neon-ninja

I used the same code as you did (https://github.com/kevinzg/facebook-scraper/issues/624#issuecomment-1006921693) and you got correct datetimes, and I did not.

staszeks avatar May 19 '22 11:05 staszeks

At the time I ran that, the comments weren't very old so showed more time information. Try:

set_noscript(True)

post = next(get_posts(post_urls=[5284284578322619], options={"comments": "generator"}))
for comment in post["comments_full"]:
    print(comment["comment_time"])

For me, this outputs:

2022-05-16 14:18:00
2022-05-19 11:18:00
2022-05-14 05:40:00
2022-05-14 07:47:00
2022-05-14 06:16:00
2022-05-18 00:00:00
2022-05-17 00:00:00
2022-05-14 05:42:00
2022-05-15 03:02:00
2022-05-16 08:05:00
2022-05-14 09:17:00
2022-05-14 06:11:00
2022-05-19 22:00:00
2022-05-15 18:25:00
2022-05-14 05:49:00
2022-05-14 17:02:00
2022-05-14 07:45:00
2022-05-14 15:49:00
2022-05-14 06:01:00
2022-05-15 19:41:00
2022-05-14 17:32:00

neon-ninja avatar May 19 '22 19:05 neon-ninja