pornhub-api icon indicating copy to clipboard operation
pornhub-api copied to clipboard

_scrapLiVideos() in getVideos() raise AttributeError, line 80 soup_data.find("div", class_="sectionWrapper") returns None

Open AcGW-RErCotWd opened this issue 3 years ago • 7 comments

While calling getVideos() to search a keyword, the _scrapLiVideos() method raises an AttributeError, saying on line 80

soup_data.find("div", class="sectionWrapper") returns None, and thereafter None has no Attribute 'find_all'

AcGW-RErCotWd avatar Apr 29 '22 00:04 AcGW-RErCotWd

Traceback (most recent call last): File "***site-packages\pornhub\videos.py", line 247, in getVideos for possible_video in self._scrapLiVideos(self._loadPage(page_num=page, sort_by=sort_by)): File "***site-packages\pornhub\videos.py", line 80, in scrapLiVideos return soup_data.find("div", class="sectionWrapper").find_all("li", { "class" : re.compile(".videoblock videoBox.") } ) AttributeError: 'NoneType' object has no attribute 'find_all'

AcGW-RErCotWd avatar Apr 29 '22 00:04 AcGW-RErCotWd

I believe there is an error happening in the _loadPage() method, where the page loaded was actually the home page instead of the search page wanted (thus making the _scrapLiVideos() to malfunction). This is pretty weird, lol

AcGW-RErCotWd avatar Apr 29 '22 03:04 AcGW-RErCotWd

Okay, there is a 404 request error in line 66 of _loadPage(), that is probably the cause of trouble

And the reason why such trouble occurs is because I input a keyword string where punctuations exists.

AcGW-RErCotWd avatar Apr 29 '22 04:04 AcGW-RErCotWd

But the error continues, the _scrapLiVideos(self, soup_data) returns an empty list that causes an infinite loop

The find('sectionWrapper') method seems to return the 'sectionWrapper PremiumSuggestion' node, which is not wanted

AcGW-RErCotWd avatar Apr 29 '22 04:04 AcGW-RErCotWd

Okay, there is a 404 request error in line 66 of _loadPage(), that is probably the cause of trouble

And the reason why such trouble occurs is because I input a keyword string where punctuations exists.

ok i'll add code that removes any punctuation:

for item in self.keywords:
    item = re.sub(r"[^\w\s]", "", item).replace("_", " ")

That seems to work

SashaSZ avatar Apr 29 '22 05:04 SashaSZ

Cool, I also found a way to correct the infinite loop error caused by not finding the wanted "sectionWrapper" I rewrite and change the method _scrapLiVideos() a bit, and so far it seems to work.

    def _scrapLiVideos(self, soup_data) -> list:
        section_wrappers = soup_data.findAll("div", class_="sectionWrapper")
        for wrapper in section_wrappers:
            LiVideos = wrapper.find_all("li", {"class": re.compile(".*videoblock videoBox.*")})
            if LiVideos != []:
                return LiVideos
        raise Exception('LiVideos Not Found')

AcGW-RErCotWd avatar Apr 29 '22 05:04 AcGW-RErCotWd

Cool, I also found a way to correct the infinite loop error caused by not finding the wanted "sectionWrapper" I rewrite and change the method _scrapLiVideos() a bit, and so far it seems to work.

    def _scrapLiVideos(self, soup_data) -> list:
        section_wrappers = soup_data.findAll("div", class_="sectionWrapper")
        for wrapper in section_wrappers:
            LiVideos = wrapper.find_all("li", {"class": re.compile(".*videoblock videoBox.*")})
            if LiVideos != []:
                return LiVideos
        raise Exception('LiVideos Not Found')

Nice, also I added something similar for future gifs just in case. Nothing seems to be broken :D

SashaSZ avatar Apr 29 '22 06:04 SashaSZ