PeerTube icon indicating copy to clipboard operation
PeerTube copied to clipboard

Animate Thumbnail upon Mouse Hover

Open kevATin opened this issue 5 years ago • 5 comments

Many video platforms (previously mostly porn sites, nowadays also YouTube) show static video thumbnails, but animate them once a user puts their mouse on them.

I am not sure how this is done in practice, maybe an animated image format is used, or the animation part is done via CSS by simply switching between static images. From my understanding this needs CSS anyways to handle switching from the static preview to the animated one, so the second option seems to make more sense to me and would allow easier editing of the thumbnail on the fly, without requiring a new gif or other animated image being generated every time.

Unlike the "normal", static, main thumbnail that can have any picture the user provides, this animated preview can only contain frames that appear inside the video.

Questions that need to be addressed for this feature:

  • How many separate frames should these animated thumbnail previews have?

  • Should they always have the same amount of frames or set on a video to video basis?

  • For how long should each frame be shown until it moves to the next one?

  • Could frame speed also be set on a video to video basis?

  • How are "videos" with only audio, or maybe too few frames handled?

  • Will the frames be picked by the server based on some algorithm that takes into account total video length, time distance to other chosen frames, amount of colors in a frame relative to other frames (to make sure blank ones aren't picked). Probably not a good idea because I'm not sure if the results were good, it'd put stress on the server, and it limits user freedom.

  • If frames are picked by the uploader; will there be any limits for that? For example a minimum time distance between individual, picked frames?

  • Should uploaders be able to easily change a video's animated preview afterwards?

Any thoughts on this?

kevATin avatar Sep 06 '20 00:09 kevATin

See https://github.com/Chocobozzz/PeerTube/issues/537

Chocobozzz avatar Sep 07 '20 06:09 Chocobozzz

See #537

@Chocobozzz Do you suggest simply using the same frames for the animated preview as for the video timeline scrubbing? I didn't think of that, but the less separate images we need the better. The amount of timeline frames would be the same regardless of video length, right? I wonder if the animated previews could feel too long, after all they always seem pretty short on YouTube. But that could easily be fixed by only taking every 2nd or 3rd image.

Also something I didn't think of when writing when opening the issue: A per-user option for this setting would be useful, since some users might not have the bandwidth to load thumbnails upon thumbnails for every video shown in the video catalog.

kevATin avatar Sep 07 '20 13:09 kevATin

I was thinking about this and realized there is a way of getting an animated video thumbnail essentially for "free" in terms of CPU use (since it doesn't require any re-encoding) and in a way that can still be bandwidth-efficient.

If an instance is configured to generate low-resolution encodes (like 240p), you can select some points from an existing low-res version of a video, seek to those points, copy a certain length from each point (say, two seconds), and then concatenate & mux these clips together to form an animated preview. Since this only involves copying and stitching low-resolution video without any re-encoding, it is basically instant.

There are some disadvantages compared to a more complicated setup, however. The main ones I can see are as follows:

  1. You don't get to choose the exact points of a video the preview segments are taken from (since your clips can only start at key frames).

  2. Since you're not re-encoding, you can't modify the frames in any way. Sharpening specifically can be very beneficial for thumbnails and could be useful here. Lowering the frame rate would also be useful for reducing the file size of the preview, but also can't be done with this method.

  3. The preview will have as many key frames as there are copied clips, increasing the file size compared to what could be achieved with Organically Farmed & Finest Hand-Crafted Encoding Parameters™.

However, the last point should be mitigated by the fact that this animated thumbnail can be generated from a very low-resolution encode like 240p or perhaps even 144p. If I recall, YouTube uses or at least did at one point use 180p animated previews (I can't be sure since I can't figure out how to download one of those right now). I can confirm that 240p looks good at thumbnail-size, which is how these would be displayed.

For the lowest bandwidth use, or maybe a specific low-bandwidth mode, you could use 144p previews. And shorter previews will of course result in a smaller size.

Below is an 8-second preview created using this method from a ~400kbps 240p H.264 encode of Big Buck Bunny. It only comes in at 375 kB, which is pretty decent for an animated preview.

  • https://github.com/Chocobozzz/PeerTube/assets/7808922/177113ae-6434-47a8-8c96-92f5f2a592b3

I used the following FFmpeg command to extract the individual clips:

ffmpeg -ss 00:02:00 -i input.mp4 -c:v copy -frames:v 48 -avoid_negative_ts make_zero -an -y output.mp4

I used -frames:v to specify a number of frames, because specifying a time with -t 2 resulted in some tomfoolery in terms of the actual length of the extracted clips; they were much longer than 2 seconds. Dunno what that's about.

I then used the concat demuxer as described here to stitch the clips together.

Hope this helps. It would be cool to have animated thumbnails. It was one of the things that made me go "Whoa, I'm really living in the future" when I first saw it.

veikk0 avatar Dec 21 '23 20:12 veikk0

Putting in my support here.

I'd be happy with either a seekable thumbnail on hover or a smaller 144p-240p video playing.

SimplyCorbett avatar Feb 14 '24 10:02 SimplyCorbett

Just a bit of a follow-up.

I managed to download some animated thumbnails, but from Google search, not Youtube. These days Youtube seems to just play a low-resolution version of the full video instead of a separately generated animated thumbnail. At least it does for me right now, I guess they could be serving different stuff to different people.

Anyway, here are the files:

They're very small, well under 100kb. This is due to them being 144p, 10 fps, and about 5 seconds in length. Which makes them very fast to load.

And in hindsight, if simplicity of implementation is desirable here, choosing a single spot in the video and just encoding a few-second 144p preview is probably the better choice. While it would be nice to have multiple parts of the video represented in the preview, the frame-copy and concat method I outlined above is a bit complicated when you could just use a fairly simple FFmpeg one-liner like this to do the job:

ffmpeg -ss 00:03:12.65 -i input.mp4 -c:v libx264 -x264-params scenecut=0 -g 99999 -level:v 1.1 -preset slow -crf 23 -maxrate 60k -bufsize 120k -t 5 -vf fps=20,scale=256:-2:sws_flags=lanczos,unsharp=3:3 -an -sn -map_metadata -1 -movflags +faststart output.mp4

Output:

https://github.com/Chocobozzz/PeerTube/assets/7808922/34f159a1-66b6-4d2b-a501-859ab180dcde

Since the image is being downscaled to such a tiny size, whether the source was already transcoded is relevant from a quality standpoint, and CPU usage is pretty minimal. So the "no quality loss" benefit in my previous wall of text isn't very relevant.

Some notes:

  • I use 20 fps in the command above since it looks so much better than 10 fps, which is unnecessarily low IMO, but YMMV.
  • Lanczos scaling looks very slightly better than the default bicubic scaling, but is not strictly necessary. It's more of a "why not" rather than "why".
  • The unsharp filter has a major impact at this small resolution. The default setting of 5:5 was too aggressive so I pared it back to 3:3, but there's more parameters that can be adjusted. Or maybe use a different filter altogether (cas?)
  • Turn off scene change detection and use a large GOP since this preview will not be seekable, so there's no need for more than one key frame. Should improve compression slightly.

veikk0 avatar Feb 15 '24 15:02 veikk0