facebook-post-scraper
facebook-post-scraper copied to clipboard
Added additional data collection capabilities and fixed bugs in scraper.py
Additional data elements that are now collected per post:
- Post creator
- Post creation datetime
- Post creation like count
- Post creation share count -- Previously collected inconsistently as a string. Now collected reliably as an integer.
- Post creation comment count
- Complete post text -- If a post was being shared by a FB user and additional text was added in the act of sharing, that text was lost. Fixed now.
Fixed a bug in the collection of comment threads. In the previous implementation, comment text was saved in dictionaries that were indexed by the comment author. This would result in dropped content when the same FB user would post multiple times in the comment thread.
The code has been refactored a bit as well to allow the contents of the web scraping to be read from disk and parsed. The contents of the web scraping is saved to disk prior to parsing in case there's an error downstream. This allows for subsequent debugging.