Copyright-striked videos keep getting retried forever
I tried adding a old playlist to TubeSync, and I forgot that apparently there were a couple of videos that were copyright striked. Now I'm pretty much in perpetual agony with this task keep popping up:
2023-02-11 22:28:16,902 [tubesync/ERROR] ERROR: [youtube] lm0cTOQpOf8: Video unavailable. This video is no longer available due to a copyright claim by Sony Music Entertainment(Japan) Inc.
Rescheduling Downloading metadata for "783a861b-d8e4-46af-bb5a-92728218916b"
Traceback (most recent call last):
File "/home/kuhnchris/.local/share/virtualenvs/tubesync-l4jy0Bsy/lib/python3.10/site-packages/background_task/tasks.py", line 43, in bg_runner
func(*args, **kwargs)
File "/home/kuhnchris/tubesync/tubesync/sync/tasks.py", line 227, in download_media_metadata
metadata = media.index_metadata()
File "/home/kuhnchris/tubesync/tubesync/sync/models.py", line 1285, in index_metadata
return indexer(self.url)
File "/home/kuhnchris/tubesync/tubesync/sync/youtube.py", line 60, in get_media_info
raise YouTubeError(f'Failed to extract_info for "{url}": No metadata was '
sync.youtube.YouTubeError: Failed to extract_info for "https://www.youtube.com/watch?v=lm0cTOQpOf8": No metadata was returned by youtube-dl, check for error messages in the logs above. This task will be retried later with an exponential backoff.
Rescheduling task Downloading metadata for "783a861b-d8e4-46af-bb5a-92728218916b" for 1:49:26 later at 2023-02-12 00:17:42.908426+00:00
Secondary issue here is, that the ytb_dl doesn't really provide any special exit code for this, or even an option to get the error from the API JSON file, only the error message shown above, and I am not sure if that text is universally like that, or if it is translated if your IP is from a country with translations (from my austrian IP it doesn't show up as german, so I have my hopes up).
If it's consistantly this string, we could check in sync/youtube.py if the text is "ERROR" and starts with "Video unavailable.", and also for the term "copyright claim", with those 3 strings we should be able to classify this as a permanent error that makes the task finish processing and store the error message on the media object.
If @meeb agrees with this sentiment and procedure, I'd implement it as a PR, so please let me know if you got any other ideas how we can actually get this properly checked/vetted so we do not have false positives.
Are you sure this is an endless loop? Internally TubeSync just calls yt-dlp then if the download fails reties it with an exponential back-off. While the task will be attempted multiple times it should eventually fail with a permanent error. This is even stated in the error you pasted in your log. If this is actually an endless loop I'll look into it, but I'm pretty sure this will just be retried multiple times before permanently failing.
Well, yes, it keeps trying, and the task never disappears. I have no idea if there is some kind of rate limit on the API, so we'd need to be aware if we keep poking the API with "dead" videos. It keeps trying tho, since yesterday. (with, as you mentioned, increased fall-off)
2023-02-12 12:43:48,686 [tubesync/ERROR] ERROR: [youtube] lm0cTOQpOf8: Video unavailable. This video is no longer available due to a copyright claim by Sony Music Entertainment(Japan) Inc.
Rescheduling Downloading metadata for "783a861b-d8e4-46af-bb5a-92728218916b"
Traceback (most recent call last):
File "/home/kuhnchris/.local/share/virtualenvs/tubesync-l4jy0Bsy/lib/python3.10/site-packages/background_task/tasks.py", line 43, in bg_runner
func(*args, **kwargs)
File "/home/kuhnchris/tubesync/tubesync/sync/tasks.py", line 227, in download_media_metadata
metadata = media.index_metadata()
File "/home/kuhnchris/tubesync/tubesync/sync/models.py", line 1285, in index_metadata
return indexer(self.url)
File "/home/kuhnchris/tubesync/tubesync/sync/youtube.py", line 60, in get_media_info
raise YouTubeError(f'Failed to extract_info for "{url}": No metadata was '
sync.youtube.YouTubeError: Failed to extract_info for "https://www.youtube.com/watch?v=lm0cTOQpOf8": No metadata was returned by youtube-dl, check for error messages in the logs above. This task will be retried later with an exponential backoff.
Rescheduling task Downloading metadata for "783a861b-d8e4-46af-bb5a-92728218916b" for 0:21:41 later at 2023-02-12 13:05:29.693296+00:00
The retries should be exponential in delay but offer irregular retries up to a day or two. Originally this was to account for network errors (you might have bad internet for a day) and also for things like livestreams that might take 24 hours or more to have a VOD available to download that match your source requirements. It is by design you might get 20 or so of these errors, then a hard failure. If you persist in having these errors after 2-3 days then it probably needs an adjustment. yt-dlp doesn't use any official APIs so there's not much of a risk of incurring the wrath of YouTube. If you've just had this issue since yesterday, ignore it for another 24h and see if the task transitions into a "permanent failure" state which is what should happen.
I have this problem with 6 videos too. I think most of these were in my Watch Later list. I've edited my list and removed the unavailable videos, but Tubesync is still trying to get the metadata. These don't show in the scheduled tasks. I also have one which appears region blocked. "sQbENNKjpsc: Video unavailable. This video contains content from NBC Universal, who has blocked it in your country on copyright grounds"
Is there any way I can cancel these? It seems they have gone on for a few days. I'm using a Postgres db.
@ShadowfootNZ the tasks should eventually just stop retrying by themselves, there's no endless loop ability in the code. There are quite a lot of retries over a number of days so you can either just ignore it, or manually remove the media items (one row per media item) in the media table in your database then restart the container.
This should have been resolved for a while so I'll close this issue for now. Please create a new issue if you still experience issues.