tubesync icon indicating copy to clipboard operation
tubesync copied to clipboard

Weird behavior adding multiple playlists with shared video(s)

Open micahmo opened this issue 4 years ago • 6 comments

Hi @meeb, I'm back again. :-) I think I discovered a very interesting issue related to adding multiple playlists that contain some of the same videos. It just so happened that I was adding two playlists from the same channel which shared a video, and there were some interesting consequences. (Actually, I wanted to add three playlists from the same channel where A and B shared a video, and B and C shared a different video. But two is enough to repro the issue.)

Here's how you can reproduce. The settings for the source don't seem to matter.

  • Add this playlist as a source: https://www.youtube.com/playlist?list=PLaDrN74SfdT6nO7cnR0zcfM6Bc5UQuKXG
    • This playlist has 4 videos. TubeSync correctly identifies 4 media items and downloads them all. So far so good.
  • Add this playlist as a source: https://www.youtube.com/playlist?list=PLaDrN74SfdT5jEs3RCI53nBUkuURSWrhq
    • This playlist has 25 videos and shares one video (oCp_2hqguUg) with the first playlist.
    • TubeSync correctly identifies the 25 media items.

Now there are two interesting problems with the system.

  • TubeSync only downloads 24 videos from the second playlist. It skips the shared video, presumably because it thinks it is already downloaded.
  • TubeSync now lists only 3 media items for the first playlist (this happens immediately after adding the second).

So the shared video is downloaded for one playlist but not listed in its media items, and it's not downloaded for the other playlist where it is listed in the media items.

Let me know if there's anything else I can do to test for you. And sorry to put more work on your plate, but I just wanted to report the issue. I'm only finding these interesting scenarios because I am using TubeSync a lot. :-)

micahmo avatar Mar 05 '21 21:03 micahmo

Thanks for the detailed description, I'll have a go at replicating this. It's quite possible there's a bug in handling of media items as unique per (source, video_id) and just using the video_id in some places which would cause this. I'll log this as a bug for now.

meeb avatar Mar 06 '21 03:03 meeb

I've got some channels I monitor both audio and video from. I've added basically duplicate sources and changed the KEY value to include '-music' on the duplicate entry. I hadn't thought about checking this scenario until reading the issue above, but another scenario that might be affected. I'll have a go at adding them and see what comes of it.

PhuriousGeorge avatar Mar 06 '21 04:03 PhuriousGeorge

The "key" is the actual string used in the URL template so I would guess that won't work, if you added a channel somechannel and changed the key to somechannel-music it would try and index https://youtube.com/c/somechannel-music/videos at the moment. I don't think there's a functional way to add a channel as a source twice but with different settings at the moment without removing the (source, key) unique validation check.

meeb avatar Mar 06 '21 04:03 meeb

Just came back to say I noticed exactly what you mentioned. You're too fast ;) Opened #91

PhuriousGeorge avatar Mar 06 '21 04:03 PhuriousGeorge

Hey @meeb,

I decided to take a quick peek at this myself.

It looks like the problem originates around here. The index task looks for a media object by key (the video's ID), which will match if the video has already been indexed for another source. Then it will (a) change the media object's source to the current one being indexed and (b) not download it, since downloaded is already true.

for video in videos:
    # Create or update each video as a Media object
    key = video.get(source.key_field, None)
    if not key:
        # Video has no unique key (ID), it can't be indexed
        continue
    try:
        media = Media.objects.get(key=key)
    except Media.DoesNotExist:
        media = Media(key=key)
    media.source = source
    try:
        media.save()
        log.info(f'Indexed media: {source} / {media}')
    except IntegrityError as e:
        log.error(f'Index media failed: {source} / {media} with "{e}"')

I think the fix is pretty simple -- just match the media object on both key and source. (The key itself doesn't have to change, and it shouldn't since it can be used in file naming and must refer to the video ID.) I changed the media query to this, and it seems to have fixed the issue.

media = Media.objects.get(key=key, source=source)

I didn't see anywhere else that only queries on key, so I'm thinking this is the only place that has to change.

In case I'm on the right track here, I opened PR #95, but I'm obviously not super familiar with this project (or even Django), so if I'm totally off base or if there are further implications to this change, don't feel obligated to merge!

micahmo avatar Mar 12 '21 01:03 micahmo

@micahmo 's patch might fix this but this probably needs a further confirmation review.

meeb avatar Mar 14 '21 13:03 meeb