tvtv
Site
tvtv.us
Description
Starts getting error 429 after several 100 scrapes. I tried setting the delay to 1000ms, 1500ms, and 2000ms but keep seeing the issue.
It almost appears to happen after a sequence of 0 program scrapes. I wonder if the delay is not happening when nothing is downloaded which causes tvtv.us to throttle which triggers 0 programs downloaded. (Ie a feedback loop)
I believe the delay argument is being ignored.
I edited the config file locally and changed it to 30 seconds 30000 but still seeing multiple channels being grabbed in a 10-15 seconds period.
I increased the delay in the config to 1500ms (https://github.com/iptv-org/epg/commit/e3a4aa4328f7801a39bc7f63a14a2e1f6f55af50) and the error seems to be gone.
Try to update and maybe this time it will help.
The delay in the config does not seem to do anything for me. Setting it to 15000 I still see channels being fetched one after the other immediately I personally tried the following barbaric way to introduce a sleep. I opened tvtv.us.config.js I added a new function at the bottom
function msleep(n) {
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, n);
}
I then edited the url: function ({ date, channel }) to this
url: function ({ date, channel }) {
msleep(15000)
return `https://www.tvtv.us/api/v1/lineup/USA-NY71652-DEFAULT/grid/${date.toJSON()}/${date
.add(1, 'd')
.toJSON()}/${channel.site_id}`
},
You can control the delay by changing the value.
I started at 5000 then doubled it to 10000 (10 sec delay) but I was still seeing 429s.
Even with 15sec delay between requests I am getting 429s.
They seem to be behind cloudlfare. So at this point I am not sure what type of detection they are doing but it seems to be more than just number of requests per x amount of time.
I also tried waiting few hrs and tried pulling the data via curl but I still get a 429 for certain channels.
Running into this issue as well. Knowing tvtv.us.channels.xml is insanely large; I first remove all those local US stations Kxxxxx(x).us and Wxxxxx(x).us from tvtv.us.channels.xml with a sed -i command. But even with 978fd2a / e3a4aa4 in place at 1500 ms, I'm still running into occasional 429s for a random channel at a time. The whole grab takes 40-50 minutes and are cronned for once in 4 hours (55 */4 * * *). Are we still pushing it too hard?
I could be wrong but wouldn't 55 */4 * * * mean it runs at minute 55 every 4th hr? So in a day you are running it 6 times ?
If you believe cloudflare is detecting the large number of requests one possible thing to try is to mess around with the headers, or even possibly the JA3 fingerprint.
https://developers.cloudflare.com/ruleset-engine/rules-language/fields/ https://github.com/salesforce/ja3
But at this point is just speculation. What I have noticed in the past that the 429 seems to happen even if you wait 15 minutes for the same channel. I tried with curl and I was getting 429s even after waiting long period of times.
I could be wrong but wouldn't
55 */4 * * *mean it runs at minute 55 every 4th hr? So in a day you are running it 6 times ?
Correct, so at: 00:55, 04:55, 08:55, 12:55, 16:55, 20:55
once in 4 hours (55 */4 * * *)
Do you find 6 times a day high or low?
But at this point is just speculation. What I have noticed in the past that the 429 seems to happen even if you wait 15 minutes for the same channel. I tried with curl and I was getting 429s even after waiting long period of times.
From my point of view it seems more likely that the (temporary) blocking happens based on total number of requests in a period for the whole host and not per endpoint(/channel). When grabbing multiple days a 429 can occur on grab day 1, and a sequential grab of day 2 of the same channel could result in a 200 with valid content.
Edit 4 months later: still the same spotty behavior. Even after i've lowered the pulling rate to once every 12 hours. Can't seem to be bothered to create a solution. Which would likely involve something with proxies.