snscrape icon indicating copy to clipboard operation
snscrape copied to clipboard

vkontakte-user function crashes immediately on use

Open AccentuSoft opened this issue 2 years ago • 1 comments

Describe the bug

Running the software in a regular fashion results in errors:

$ snscrape vkontakte-user durov
2023-02-24 14:31:55.752  WARNING  snscrape.modules.vkontakte  Skipping post without link: '<div class="_post post page_block all own post--withPostBottomAction post--with-likes closed_comments deep_active Post--redesign" data-post-id="1_2442097" data-replies-limit="0" id="post1_2442097" onc'
2023-02-24 14:31:55.808  CRITICAL  snscrape._cli  Dumped stack and locals to /tmp/snscrape_locals__x7ru5_r
Traceback (most recent call last):
  File "[PATH]/venv2/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/_cli.py", line 318, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 278, in get_items
    yield from _process_soup(soup)
  File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 273, in _process_soup
    postID = int(item.url.rsplit('_', 1)[1])
AttributeError: 'NoneType' object has no attribute 'url'

In vkontakte.py:

Instead of post_link class, we see PostHeaderSubtitle__link. For dates, instead of this: post.find('div', class_ = 'post_date').find('span', class_ = 'rel_date') we found this to work: postLink.find('time', class_ = 'PostHeaderSubtitle__item')

By doing those replacements, we find that the function starts (mostly) working again. We're not sure what the full extent of the replacements needs to be.

How to reproduce

Run the command: snscrape vkontakte-user durov

Expected behaviour

After doing the aforementioned replacements, we start getting results like so:

$ snscrape vkontakte-user durov https://vk.com/wall1_2442097 https://vk.com/wall1_2431591 https://vk.com/wall1_2422169 https://vk.com/wall1_2418560 https://vk.com/wall1_2412029 https://vk.com/wall1_2407925 https://vk.com/wall1_2405336 https://vk.com/wall1_2401719 https://vk.com/wall1_2401089 ...

Screenshots and recordings

No response

Operating system

Ubuntu 22.04

Python version: output of python3 --version

Python 3.10.6

snscrape version: output of snscrape --version

snscrape 0.5.0.20230113 & snscrape 0.5.0.20230114.dev31+gf329b69

Scraper

vkontakte-user

Backtrace

No response

Dump of locals

No response

How are you using snscrape?

CLI (snscrape ... as a command, e.g. in a terminal)

Additional context

No response

AccentuSoft avatar Feb 24 '23 12:02 AccentuSoft

Indeed, VK restructured their HTML sometime recently, as I discovered a few days ago. Thanks for filing an issue about it.

JustAnotherArchivist avatar Feb 24 '23 22:02 JustAnotherArchivist