hacs_waste_collection_schedule icon indicating copy to clipboard operation
hacs_waste_collection_schedule copied to clipboard

Banyule Victoria AU not working

Open Boabee opened this issue 2 years ago • 1 comments

I may be incorrectly using the integration but I think there may be an issue as the local council have recently introduced another bin and so the calendar may not pull the data the same way?

My waste collection calendar is as follows:

waste_collection_schedule: sources: - name: banyule_vic_gov_au args: street_address: an address in, IVANHOE customize: - type: recycling alias: Fogo show: true icon: mdi:recycle picture: false calendar_title: Recycling

Boabee avatar Jul 17 '22 11:07 Boabee

I authored that source. I'm not sure if the council has changed the interface for the new bins, it's a bit redundant at the moment since it seems OpenCities (Banyule and a number of other councils have outsourced their websites to them) have recently implemented anti-scraping features on that API.

The anti-scraping protection is a little nasty. In my testing the response I get from the API without some magic cookies redirects to a JavaScript file that's heavily obfuscated. PR #250 was reverted in #256 for this reason. In my original PR (#160) we discussed sharing some common code for OpenCities-sourced APIs - since they seem identical - but now it means there are probably a range of sources that are broken by one feature change on their end.

ravngr avatar Jul 28 '22 00:07 ravngr

It looks like this can be made to work if you just make the final api call using the geolocationid.

import json
import requests
from bs4 import BeautifulSoup
from datetime import datetime

URL = "https://www.banyule.vic.gov.au/ocapi/Public/myarea/wasteservices"
HEADERS = {
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/112.0",
    "referer": "https://www.banyule.vic.gov.au/Waste-environment/Bin-collection"
}
GEOLOC = "4f7ebfca-1526-4363-8b87-df3103a10a87"  # borrowed from banyule_vic_gov_au.py
PARAMS = {
    'geolocationid': GEOLOC,
    "ocsvclang": "en-AU"
}

s = requests.Session()
r = s.get(
    URL,
    headers=HEADERS,
    params=PARAMS,
)

schedule = json.loads(r.text)
soup = BeautifulSoup(schedule["responseContent"], "html.parser")

x = soup.findAll("div", {"class": "note"})
y = soup.findAll("div", {"class": "next-service"})

a = [item.text.strip() for item in x]
b = [datetime.strptime(item.text.strip(),"%a %d/%m/%Y").date() for item in y]
z = list(zip(a,b))

print(z)
[('Food organics and garden organics', datetime.date(2023, 4, 17)), ('Recycling', datetime.date(2023, 4, 17)), ('Rubbish', datetime.date(2023, 4, 24))]

Implementing this change probably means anyone who was previously using it will have to change their config, and spend a few minutes extracting the geolocationid the website is using for their address. I'd assume that's acceptable if it's currently not working?

dt215git avatar Apr 16 '23 13:04 dt215git

Ok, this sound very interesting. I think changing the config is not big issue, because no one uses this source.

mampfes avatar Apr 16 '23 15:04 mampfes

The source supports providing geolocation_id manually thus skipping the address lookup step, an example is included in the source documentation. Otherwise, I think the only difference in the code from @dt215git is the headers? I experimented with copying all the headers from the browser at one stage - excluding the anti-scraping magic cookie - but couldn't bypass the anti-scraping once triggered, but maybe it will contribute to not trigging it in the first place 🤷.

From memory the issue is somewhat transient, coming and going at the whim of the back-end. I used the source for a few weeks before getting redirects almost 100% of the time. When it broke I assumed an update on their end, maybe it's been backed off since or I (and others) got unlucky?

ravngr avatar Apr 17 '23 00:04 ravngr

Same script now generates errors, so maybe I got lucky when looking at this last week.

dt215git avatar Apr 25 '23 19:04 dt215git