fbcrawl icon indicating copy to clipboard operation
fbcrawl copied to clipboard

Cannot crawl more than 60 pages comment

Open mememoto opened this issue 5 years ago • 2 comments

I tried to crawl a post with 2700 comments. But I can only run it to page 60

The post link: m.facebook.com/story.php?story_fbid=2226458920929531&id=2226454927596597&p=60&av=100036884506828&eav=AfaGbbgANEKjTi_nwspovtij7sx25oDoiBkQDA3hr_fqX5KDrdqrLBeclI6ydoKspu8&refid=52

The command: scrapy crawl comments -a email="MAIL" -a password="PASSWORD" -a post="m.facebook.com/story.php?story_fbid=2226458920929531&id=2226454927596597" -o comment_post.csv -a lang="en" -a date="2019-05-05"

Because of that, I could on get about 126 comments from this post. Is there a way to improve this? or an alternative way? Any suggestions would be welcomed.

mememoto avatar May 19 '19 08:05 mememoto

I'm facing a similar a problem but not just 60 comments I'm getting just 1.8k out of 165k comments !

did you acquire any solutions ?

Raki22 avatar Jan 24 '20 21:01 Raki22

DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter'

try adding this line to settings.py file of fbcrawl project

according to the documentation, scrapy's spider calls off when encountering duplicate links (exactly what usually happens with the comments of one post )

that line above should prevent that. But please be careful, using this option in other spiders of other targets may lead to crawl traps.

Raki22 avatar Jan 24 '20 22:01 Raki22