infoqscraper icon indicating copy to clipboard operation
infoqscraper copied to clipboard

Scraping fails due to metadata changes

Open andreweacott opened this issue 4 years ago • 3 comments

Found in version 0.1.5

As of March 2019, scraping presentations no longer works due to format changes in the presentation HTML page.

Traceback (most recent call last):
  File "/usr/local/bin/infoqscraper", line 33, in <module>
    sys.exit(main.main())
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 374, in main
    return module.main(infoq_client, args.module_args)
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 194, in main
    return command.main(infoq_client, args.command_args)
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/main.py", line 314, in main
    builder.create_presentation()
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/convert.py", line 82, in create_presentation
    video = self.download_video()
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/convert.py", line 103, in download_video
    rvideo_path = self.presentation.metadata['video_path']
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/scrap.py", line 171, in metadata
    'title': get_title(pres_div),
  File "/usr/local/lib/python2.7/dist-packages/infoqscraper/scrap.py", line 91, in get_title
    return pres_div.find('h1', class_="general").div.get_text().strip()
AttributeError: 'NoneType' object has no attribute 'find'

In fact, the fields that scrap.py is looking for are metadata and are not used by the main application. Removing them allows presentation to be grabbed correctly.

andreweacott avatar Jul 09 '19 20:07 andreweacott