infoqscraper
infoqscraper copied to clipboard
Scraping fails due to metadata changes
Found in version 0.1.5
As of March 2019, scraping presentations no longer works due to format changes in the presentation HTML page.
Traceback (most recent call last):
File "/usr/local/bin/infoqscraper", line 33, in <module>
sys.exit(main.main())
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 374, in main
return module.main(infoq_client, args.module_args)
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 194, in main
return command.main(infoq_client, args.command_args)
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 314, in main
builder.create_presentation()
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/convert.py", line 82, in create_presentation
video = self.download_video()
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/convert.py", line 103, in download_video
rvideo_path = self.presentation.metadata['video_path']
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/scrap.py", line 171, in metadata
'title': get_title(pres_div),
File "/usr/local/lib/python2.7/dist-packages/infoqscraper/scrap.py", line 91, in get_title
return pres_div.find('h1', class_="general").div.get_text().strip()
AttributeError: 'NoneType' object has no attribute 'find'
In fact, the fields that scrap.py
is looking for are metadata and are not used by the main application. Removing them allows presentation to be grabbed correctly.