fbcrawl
fbcrawl copied to clipboard
Cannot crawl more than 60 pages comment
I tried to crawl a post with 2700 comments. But I can only run it to page 60
The post link:
m.facebook.com/story.php?story_fbid=2226458920929531&id=2226454927596597&p=60
&av=100036884506828&eav=AfaGbbgANEKjTi_nwspovtij7sx25oDoiBkQDA3hr_fqX5KDrdqrLBeclI6ydoKspu8&refid=52
The command:
scrapy crawl comments -a email="MAIL" -a password="PASSWORD" -a post="m.facebook.com/story.php?story_fbid=2226458920929531&id=2226454927596597" -o comment_post.csv -a lang="en" -a date="2019-05-05"
Because of that, I could on get about 126 comments from this post. Is there a way to improve this? or an alternative way? Any suggestions would be welcomed.
I'm facing a similar a problem but not just 60 comments I'm getting just 1.8k out of 165k comments !
did you acquire any solutions ?
DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter'
try adding this line to settings.py file of fbcrawl project
according to the documentation, scrapy's spider calls off when encountering duplicate links (exactly what usually happens with the comments of one post )
that line above should prevent that. But please be careful, using this option in other spiders of other targets may lead to crawl traps.