hacs_waste_collection_schedule
hacs_waste_collection_schedule copied to clipboard
Banyule Victoria AU not working
I may be incorrectly using the integration but I think there may be an issue as the local council have recently introduced another bin and so the calendar may not pull the data the same way?
My waste collection calendar is as follows:
waste_collection_schedule: sources: - name: banyule_vic_gov_au args: street_address: an address in, IVANHOE customize: - type: recycling alias: Fogo show: true icon: mdi:recycle picture: false calendar_title: Recycling
I authored that source. I'm not sure if the council has changed the interface for the new bins, it's a bit redundant at the moment since it seems OpenCities (Banyule and a number of other councils have outsourced their websites to them) have recently implemented anti-scraping features on that API.
The anti-scraping protection is a little nasty. In my testing the response I get from the API without some magic cookies redirects to a JavaScript file that's heavily obfuscated. PR #250 was reverted in #256 for this reason. In my original PR (#160) we discussed sharing some common code for OpenCities-sourced APIs - since they seem identical - but now it means there are probably a range of sources that are broken by one feature change on their end.
It looks like this can be made to work if you just make the final api call using the geolocationid
.
import json
import requests
from bs4 import BeautifulSoup
from datetime import datetime
URL = "https://www.banyule.vic.gov.au/ocapi/Public/myarea/wasteservices"
HEADERS = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/112.0",
"referer": "https://www.banyule.vic.gov.au/Waste-environment/Bin-collection"
}
GEOLOC = "4f7ebfca-1526-4363-8b87-df3103a10a87" # borrowed from banyule_vic_gov_au.py
PARAMS = {
'geolocationid': GEOLOC,
"ocsvclang": "en-AU"
}
s = requests.Session()
r = s.get(
URL,
headers=HEADERS,
params=PARAMS,
)
schedule = json.loads(r.text)
soup = BeautifulSoup(schedule["responseContent"], "html.parser")
x = soup.findAll("div", {"class": "note"})
y = soup.findAll("div", {"class": "next-service"})
a = [item.text.strip() for item in x]
b = [datetime.strptime(item.text.strip(),"%a %d/%m/%Y").date() for item in y]
z = list(zip(a,b))
print(z)
[('Food organics and garden organics', datetime.date(2023, 4, 17)), ('Recycling', datetime.date(2023, 4, 17)), ('Rubbish', datetime.date(2023, 4, 24))]
Implementing this change probably means anyone who was previously using it will have to change their config, and spend a few minutes extracting the geolocationid the website is using for their address. I'd assume that's acceptable if it's currently not working?
Ok, this sound very interesting. I think changing the config is not big issue, because no one uses this source.
The source supports providing geolocation_id
manually thus skipping the address lookup step, an example is included in the source documentation. Otherwise, I think the only difference in the code from @dt215git is the headers? I experimented with copying all the headers from the browser at one stage - excluding the anti-scraping magic cookie - but couldn't bypass the anti-scraping once triggered, but maybe it will contribute to not trigging it in the first place 🤷.
From memory the issue is somewhat transient, coming and going at the whim of the back-end. I used the source for a few weeks before getting redirects almost 100% of the time. When it broke I assumed an update on their end, maybe it's been backed off since or I (and others) got unlucky?
Same script now generates errors, so maybe I got lucky when looking at this last week.