CloudBot icon indicating copy to clipboard operation
CloudBot copied to clipboard

Horoscope plugin's HTML scraping no longer matches the page we scrape

Open nasonfish opened this issue 10 years ago • 4 comments
trafficstars

We HTML scrape from a site and that site changed their HTML such that the fields we were using previously no longer match the classes we use in horoscope.py, resulting in us not being able to find a sign and returning an error no matter what.

nasonfish avatar Oct 15 '15 02:10 nasonfish

I think we should switch away from html scraping if possible. It looks like there are a few api's available this was the top google hit and it it seems like it could work: https://github.com/tapasweni-pathak/Horoscope-API

edwardslabs avatar Oct 15 '15 12:10 edwardslabs

Thats a web frontend for https://testpypi.python.org/pypi/horoscope, which rips from Ganeshaspeaks... sooo, not really better.

On Fri, Oct 16, 2015 at 1:24 AM, Andy Edwards [email protected] wrote:

I think we should switch away from html scraping if possible. It looks like there are a few api's available this was the top google hit and it it seems like it could work: https://github.com/tapasweni-pathak/Horoscope-API

— Reply to this email directly or view it on GitHub https://github.com/CloudBotIRC/CloudBot/issues/199#issuecomment-148370220 .

dmptrluke avatar Oct 15 '15 12:10 dmptrluke

If there is no free api maybe horoscope gets dropped since maintaining an HTML scraping plugin can be pretty burdensome.

edwardslabs avatar Oct 15 '15 14:10 edwardslabs

For what it's worth, here's an updated version of the plugin, but where to go from here is debatable, if we should just keep supporting this site or not.

# Plugin by Infinity - <https://github.com/infinitylabs/UguuBot>

import requests
from bs4 import BeautifulSoup

from cloudbot import hook
from cloudbot.util import formatting


@hook.on_start()
def init(db):
    db.execute("create table if not exists horoscope(nick primary key, sign)")
    db.commit()


@hook.command(autohelp=False)
def horoscope(text, db, bot, notice, nick):
    """<sign> - get your horoscope"""

    headers = {'User-Agent': bot.user_agent}

    # check if the user asked us not to save his details
    dontsave = text.endswith(" dontsave")
    if dontsave:
        sign = text[:-9].strip().lower()
    else:
        sign = text

    db.execute("create table if not exists horoscope(nick primary key, sign)")

    if not sign:
        sign = db.execute("select sign from horoscope where "
                          "nick=lower(:nick)", {'nick': nick}).fetchone()
        if not sign:
            notice("horoscope <sign> -- Get your horoscope")
            return
        sign = sign[0]

    url = "http://my.horoscope.com/astrology/free-daily-horoscope-{}.html".format(sign)

    try:
        request = requests.get(url, headers=headers)
        request.raise_for_status()
    except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError) as e:
        return "Could not get horoscope: {}.".format(e)

    soup = BeautifulSoup(request.text)

    title = soup.find_all('h1', {'class': 'f40'})
    if not title:
        return "Could not get the horoscope for {}.".format(text)

    title = title[0].text.strip()
    horoscope_text = soup.find('div', {'class': 'block-horoscope-text'}).text.strip()
    result = "\x02{}\x02 {}".format(title, horoscope_text)
    result = formatting.strip_html(result)

    if text and not dontsave:
        db.execute("insert or replace into horoscope(nick, sign) values (:nick, :sign)",
                   {'nick': nick.lower(), 'sign': sign})
        db.commit()

    return result

nasonfish avatar Nov 02 '15 01:11 nasonfish