xmltv
xmltv copied to clipboard
Multiple failures scraping UK EPG data
Copied from Issue #766 in https://github.com/Catch-up-TV-and-More/plugin.video.catchuptvandmore/
Describe the bug
Nearly all EPG data is missing for UK Live channels. I started investigating this issue in the hope of being able to fix it myself but I have discovered that the root of the problem is in the xmltv data the addon pulls from https://github.com/Catch-up-TV-and-More/xmltv/ and therefore it's not something I can fix myself by editing local .py files - the data is automatically updated by a timed script.
PLEASE NOTE: I'm aware that technically, I should raise the issue in the xmltv repo section but I'm not sure if there's anyone actively monitoring it now - other than the automatic script, there doesn't appear to have been any activity for over 12 months. I will also copy / paste this entire post as an issue under the xmltv repo, just in case someone is still watching in there.
As this problem is server-based, platform info and Kodi log are not relevant - the plugin IS grabbing what little data is available so the log won't show any errors. I will delete those sections from this template and instead, I will post all the info I think is relevant to the issue and urls of the related files.
To Reproduce
Steps to reproduce the behaviour:
- Configure UK Live channels with IPTV Manager as per the instructions
- In Kodi settings Select PVR >> Guide >> Clear Data (forces refresh)
- Select "TV" page from Kodi home screen
- Look for Program listing for relevant UK channels
Expected behaviour
Programme schedule data should tell you what's on the channel.
Actual behaviour
Nearly all channels show "No information available", however, scrolling several days ahead sometimes reveals a random program listing.
My Debug Notes
My first observation is that in Catch-up-TV-and-More/xmltv/scripts/ there are two folders relevant to the UK: tv_grab_uk_bleb and tv_grab_uk_tvguide but the Bleb Website is broken and no longer contains any data. The TVguide Website IS working and it does contain all the relevant EPG data we need. The channel mapping and URL structure does not appear to have changed in comparison to the URLs the script attempts to scrape and the fact that it does occasionally manage to scrape a tiny bit of data, I'm inclined to think that the script is timing out while trying to scrape the data.
Looking at the files in the script's raw output folder, the scraping for some days fails completely and no xml file exists, the ones that do exist contain very little data and the log files reveal the failures:
Example:
If we look at one specific channel; "Yesterday" which has the EPG ID "320.tvguide.co.uk" From the latest update (12th April 2022), the Kodi EPG shows nothing at all until 00:00 on the 17th April where one single show ("Scouting for toys") is listed. The XML file for the 17th April does indeed contain that show, it also contains a couple more but for some reason they're not showing in the EPG. If we check The log file for the same day we see multiple complete failures until it gets to channel 320 where it fails on its first try, before managing to scrape 2% of the channel's data on a second attempt.
[Tue Apr 12 12:23:40 2022] Fetching listings: 0% Unable to retrieve web page for 140 Unable to retrieve web page for 140 Unable to retrieve web page for 178 Unable to retrieve web page for 178 ..... Unable to retrieve web page for 279 Unable to retrieve web page for 320 [Tue Apr 12 12:24:54 2022] Fetching listings: 2% Unable to retrieve web page for 422 Unable to retrieve web page for 422 etc....
There are 15 attempts to scrape between the two timestamps shown, (a period of 1 minute 14 seconds), which tells us that each URL attempt is only given 4.93 seconds to respond before the script calls it a timeout failure and moves on.
I'm guessing that the one partial success failed at 2% due to equally unrealistic timeout limits.