snscrape
snscrape copied to clipboard
vkontakte-user function crashes immediately on use
Describe the bug
Running the software in a regular fashion results in errors:
$ snscrape vkontakte-user durov
2023-02-24 14:31:55.752 WARNING snscrape.modules.vkontakte Skipping post without link: '<div class="_post post page_block all own post--withPostBottomAction post--with-likes closed_comments deep_active Post--redesign" data-post-id="1_2442097" data-replies-limit="0" id="post1_2442097" onc'
2023-02-24 14:31:55.808 CRITICAL snscrape._cli Dumped stack and locals to /tmp/snscrape_locals__x7ru5_r
Traceback (most recent call last):
File "[PATH]/venv2/bin/snscrape", line 8, in <module>
sys.exit(main())
File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/_cli.py", line 318, in main
for i, item in enumerate(scraper.get_items(), start = 1):
File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 278, in get_items
yield from _process_soup(soup)
File "[PATH]/venv2/lib/python3.10/site-packages/snscrape/modules/vkontakte.py", line 273, in _process_soup
postID = int(item.url.rsplit('_', 1)[1])
AttributeError: 'NoneType' object has no attribute 'url'
In vkontakte.py
:
Instead of post_link
class, we see PostHeaderSubtitle__link
.
For dates, instead of this:
post.find('div', class_ = 'post_date').find('span', class_ = 'rel_date')
we found this to work:
postLink.find('time', class_ = 'PostHeaderSubtitle__item')
By doing those replacements, we find that the function starts (mostly) working again. We're not sure what the full extent of the replacements needs to be.
How to reproduce
Run the command:
snscrape vkontakte-user durov
Expected behaviour
After doing the aforementioned replacements, we start getting results like so:
$ snscrape vkontakte-user durov https://vk.com/wall1_2442097 https://vk.com/wall1_2431591 https://vk.com/wall1_2422169 https://vk.com/wall1_2418560 https://vk.com/wall1_2412029 https://vk.com/wall1_2407925 https://vk.com/wall1_2405336 https://vk.com/wall1_2401719 https://vk.com/wall1_2401089 ...
Screenshots and recordings
No response
Operating system
Ubuntu 22.04
Python version: output of python3 --version
Python 3.10.6
snscrape version: output of snscrape --version
snscrape 0.5.0.20230113 & snscrape 0.5.0.20230114.dev31+gf329b69
Scraper
vkontakte-user
Backtrace
No response
Dump of locals
No response
How are you using snscrape?
CLI (snscrape ...
as a command, e.g. in a terminal)
Additional context
No response
Indeed, VK restructured their HTML sometime recently, as I discovered a few days ago. Thanks for filing an issue about it.