NewPipeExtractor
NewPipeExtractor copied to clipboard
Multiple images support
- [x] I carefully read the contribution guidelines and agree to them.
- [ ] I have tested the API against NewPipe. No, because it requires to do changes in it which I don't know how to do, at least in the best way.
- [ ] I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API. No, because I don't know how to do the app support, at least properly, such as implementing options to select an image which fits the resolution of the UI element on which it will be shown (or a highest one if it is already in cache), selecting maximum or minimum resolutions.
This pull request introduces support of multiple images instead of a single image URL.
This will allow control over the image quality used, and also improve it in several places, especially on YouTube.
These changes of course will benefit to all extractor clients, and not only to NewPipe itself: Piped is a good example on which low quality of images can be easily seen.
Click here to hide or show how images of Creative Commons channel on Piped looks and how they would look if best resolutions are used instead
Current look | Look if higher images provided are used |
---|---|
![]() |
![]() |
Concept implementation and code changes details
Data of images are handled by a specific new class, Image
, containing the URL to an image, its height and width, if they are known; otherwise, the relevant constants are returned (HEIGHT_UNKNOWN
and WIDTH_UNKNOWN
).
Contrary to a previous attempt (#268), the real size is so returned instead of a quality level. In my opinion, it is not to the extractor to decide if images are high or small, but to clients, as each client has its own needs.
Images are returned as unmodifiable lists, because you should be not able to modify data extracted by extractors in its objects, i.e. insert a new stream in a stream list, clients who want to do so should use copies of the lists or their elements (note: that's the case in some places and this should be fixed, but that's to be discussed in a separated issue).
The images methods (and attributes) of extractor classes have been updated to reflect the changes:
Click here to hide or show image methods (and attributes) changes of extractor classes
Class/Interface | Method(s) removed (getters return type: String ; setters return type: void ) |
Method(s) added (getters return type: List<Image> ; setters return type: void ) |
---|---|---|
InfoItem |
- getThumbnailUrl() - setThumbnailUrl(String) |
- getThumbnails() - setThumbnails(List<Image>) |
CommentsInfoItem |
- getUploaderAvatarUrl() - setUploaderAvatarUrl(String) |
- getUploaderAvatars() - setUploaderAvatars(List<Image>) |
StreamInfoItem |
- getUploaderAvatarUrl() - setUploaderAvatarUrl(String) |
- getUploaderAvatars() - setUploaderAvatars(List<Image>) |
InfoItemExtractor |
- getThumbnailUrl() |
- getThumbnails() |
CommentsInfoItemExtractor |
- getUploaderAvatarUrl() |
- getUploaderAvatars() |
StreamInfoItemExtractor |
- getUploaderAvatarUrl() |
- getUploaderAvatars() |
ChannelExtractor |
- getAvatarUrl() - getBannerUrl() - getParentChannelAvatarUrl() |
- getAvatars() - getBanners() - getParentChannelAvatars() |
PlaylistExtractor |
- getUploaderAvatarUrl() - getThumbnailUrl() - getBannerUrl() - getSubChannelAvatarUrl() |
- getUploaderAvatars() - getThumbnails() - getBanners() - getSubChannelAvatars() |
StreamExtractor |
- getThumbnailUrl() - getUploaderAvatarUrl() - getSubChannelAvatarUrl() |
- getThumbnails() - getUploaderAvatars() - getSubChannelAvatars() |
ChannelInfo |
- getParentChannelAvatarUrl() - setParentChannelAvatarUrl(String) - getAvatarUrl() - setAvatarUrl(String) - getBannerUrl() - setBannerUrl(String) |
- getParentChannelAvatars() - setParentChannelAvatars(List<Image) - getAvatars() - setAvatars(List<Image>) - getBanners() - setBanners(List<Image>) |
PlaylistInfo |
- getThumbnailUrl() - setThumbnailUrl(String) - getBannerUrl() - setBannerUrl(String) - getUploaderAvatarUrl() - setUploaderAvatarUrl(String) - getSubChannelAvatarUrl() - setSubChannelAvatarUrl(String) |
- getThumbnails() - setThumbnails(List<Image>) - getBanners() - setBanners(List<Image) - getUploaderAvatars() - setUploaderAvatars(List<Image>) - getSubChannelAvatars() - setSubChannelAvatars(List<Image>) |
StreamInfo |
- getThumbnailUrl() - setThumbnailUrl(String) - getUploaderAvatarUrl() - setUploaderAvatarUrl(String) - getSubChannelAvatarUrl() - setSubChannelAvatarUrl(String) |
- getThumbnails() - setThumbnails(List<Image>) - getUploaderAvatars() - setUploaderAvatars(List<Image>) - getSubChannelAvatars() - setSubChannelAvatars(List<Image>) |
Services implementations
Several methods have been added, changed and/or removed, depending of the services. These changes can be found in the commits of this PR.
The implementation of multiple images support differs between each service.
-
YouTube
: images are got fromthumbnails
arrays returned by YouTube, except for mixes when they are extracted (and so when they do not come from a related item or a search result item), for which image and their resolutions are hardcoded:-
default.jpg
: 120x90px; -
mqdefault.jpg
: 320x180px; -
hqdefault.jpg
: 480x360px.
-
-
SoundCloud
: resolutions are always hardcoded:Click here to hide or show image resolutions used on SoundCloud
- for artworks and avatars, image and resolutions returned are:
-
mini
: 16x16px; -
t20x20
: 20x20px; -
small
: 32x32px; -
badge
: 47x47px; -
t50x50
: 50x50px; -
t67x67
: 67x67px; -
large
: 100x100px (the resolution provided by SoundCloud in its internal (and public?) API); -
t120x120
: 120x120px; -
t200x200
: 200x200px; -
t300x300
: 300x300px; -
crop
: 400x400px; -
t500x500
: 500x500px.
-
- for visuals/user banners, image and resolutions returned are (the resolution provided by SoundCloud in its internal (and public?) API is
original
):-
t1240x260
: 1240x260px; -
t2480x520
: 2480x520px.
-
- for artworks and avatars, image and resolutions returned are:
-
PeerTube
: resolution of images are only provided, and so known, for banners and avatars. The implementation uses firstavatars
andbanners
JSON arrays, as they contain multiple images;avatar
andbanner
objects are also used as a fallback if these arrays are not present or empty (so compatibility with old instances is kept); -
Bandcamp
: as images may be not squares, only image resolutions which preserve aspect ratios are selected (and hardcoded). As a matter of fact, only one dimension of an image is known per image ID:Click here to hide or show Bandcamp image IDs and their dimension known
-
10
: 1200px wide; -
101
: 90px high; -
170
: 422px high; -
171
: 646px high; -
20
: 1024px wide; -
200
: 420px high; -
201
: 280px high; -
202
: 140px high; -
204
: 360px high; -
205
: 240px high; -
206
: 180px high; -
207
: 120px high; -
43
: 100px high; -
44
: 200px high.
Known limitations/issues
section above. -
-
MediaCCC
: There is only one image for contents. A switch to the higher image resolution has been also made in this PR.
To help with implementations and harcoded image resolutions, which are always suffixes to image URLs and/or paths, a class has been added to manage them: ImageSuffix
. An object of this class stores an image suffix string to add and the height and width corresponding to the image itself.
Known limitations/issues
-
Multi-service
:- original images are not returned, for performance and size purposes;
- no changes have been also made on
Frameset
s and onStreamSegment
s: changes required to implement multi-image support on these classes should be made in a separate PR.
-
YouTube
:- videos which have a high resolution thumbnail are advertised to have a full HD size, even if it should be only HD. This is a YouTube issue;
- some videos can have a high resolution thumbnail and it can be not returned by YouTube. This is a YouTube issue and also happens with the official API;
- not all resolutions are returned, depending of the type of the content requested (video, channel, playlist, mix, search). This is a YouTube limitation, at least from the internal API: only image resolutions which would fit the best in the desktop website image elements are returned;
-
SoundCloud
: if default prefixes are changed by SoundCloud (the ones returned in the image URL provided by the internal API), image extraction would be broken; -
PeerTube
: resolution of thumbnails is not known. Should an issue be opened in the PeerTube repository to request the addition of it? -
Bandcamp
: among the image IDs used for avatars and covers, some of them does not seem to match the same dimension property known on banners, according to the following test made on the banner URL returned for the artist used inBandcampChannelExtractorTest
:Click here to hide or show results
-
10
: maximum size provided (975x180px); -
101
: height respected (90px); -
170
: height not respected and maximum size not provided (750x138px); -
171
: maximum size provided (975x180px); -
20
: maximum size provided (975x180px); -
200
: maximum size provided (975x180px); -
201
: height not respected and maximum size not provided (750x138px); -
202
: height not respected and maximum size not provided (375x69px); -
204
: height not respected and maximum size not provided (960x177px); -
205
: height not respected and maximum size not provided (640x118px); -
206
: height not respected and different resolution provided (480x89px); -
207
: height not respected and different resolution provided (320x59px); -
43
: height respected (542x100px); -
44
: maximum size provided (975x180px).
-
-
MediaCCC
: image sizes are not provided and so not known. Should an issue be opened in the MediaCCC repository to request the addition of it?
Changes on tests
Changes were also required on the tests structure, and the ones made are the following ones:
Click here to hide or show changes on the tests structure
Class/Interface | Method(s) removed (return type: void ) |
Method(s) added (return type: void ) |
---|---|---|
BaseChannelExtractorTest |
- testAvatarUrl() - testBannerUrl() |
- testAvatars() - testBanners() |
BasePlaylistExtractorTest |
- testThumbnailUrl() - testBannerUrl() - testUploaderAvatarUrl() |
- testThumbnails() - testBanners() - testUploaderAvatars() |
BaseStreamExtractorTest |
- testUploaderAvatarUrl() - testSubChannelAvatarUrl() - testThumbnailUrl() |
- testUploaderAvatars() - testSubChannelAvatars() - testThumbnails() |
Image test methods which used URL in their name have been renamed to reflect the multiple images support changes.
A default test on image collections has been added and ensures that:
- image collections are not null;
- for each image of the collections, the following conditions are met:
- its URL is secure;
- its height and width are greater or equal to the relevant unknown constants.
Each test is responsible to call this default method or use custom ones like it is made on YouTube (using the default test and asserting that each image URL contains the string yt
) and Bandcamp (using the default test and asserting that each image URL contains the string f4.bcbits.com/img
and ends with .jpg
or .png
).
Other changes
A few other changes have been also made in this PR:
- licence headers have been moved to the top of the files modified, where it was applicable and it wasn't already the case;
-
YoutubeMusicSearchExtractor
'sInfoItem
s have been moved to external classes, which do not depend of the YouTube ones anymore (it was useless to do so, as the objects of YouTube are not ordered in the same way/are not the same than on YouTube Music) in order to increase readability and perform code changes easier; - unneeded
public
modifiers have been removed from edited test classes; - missing
@Test
annotations have been added on old Junit 4 tests on modified test classes and the corresponding tests have been fixed if needed; - single class imports are used instead of wildcard imports in test classes modified when it was not the case;
- a little bit of code has been improved/refactored in some places of the files changed with this PR.
Closes #649, fixes #763.
Maybe we should also sort by Lists by image quality, either from lowest to highest or the other way.
I asked about this in the IRC channel and here is the conversation I had with Stypox (Matrix link):
Me:
How do you think a list of images should be sorted? I want your advice because I am implementing support of extracting multiple images provided by services in the extractor. For now, I'm using images' height because it seems to be the most dimension available, even if it can be unknown in some cases (such as in some Bandcamp image formats which are based on the width). For reference, here is the code I use in Java to do the sorting of a list: The Image class has three properties: URL, height and width
imageList.sort(Comparator.comparingInt(Image::getHeight));
Stypox:
I don't think they should not be sorted at all If somebody wants to sort them, they can do that on their own, we don't need to do unneeded computation Just like video streams are not sorted
Me:
Ok, then I will only make image lists unmodifiable But this will be needed in clients such as NewPipe
Stypox:
Mmmh, I don't think so, in NewPipe we just need a filter that chooses the best fitting thumbnail according to a size criteria (or something like that), which doesn't require any kind of sorting (on a side note, sorting would run in O(nlogn), while without sorting it is O(n))
is the developement still active on this one ?
Yes, it is.
If this PR were merged🥲
What's stopping this from being merged? If for any reason there is some unwanted drawback to implementing this, any kind of response can help clear the water.
Mainly lack of time. And as you can see this will probably be merged in 0.26.0.
do you have any idea when will 0.26.0 be released ? is it the next version ? or there will be other versions before it ?
thank you for your time
do you have any idea when will 0.26.0 be released ? is it the next version ? or there will be other versions before it ?
thank you for your time
0.25.1; There may be updates like 0.25.5 and 0.25.9 etc
A couple of months I guess. We will merge many new features in 0.26.0
Any idea when will this be implemented ? Also guys can I ask what are you trying to do that made you so busy with changing the code ?
Any idea when will this be implemented ? Also guys can I ask what are you trying to do that made you so busy with changing the code ?
We're primarily blocked by the changes required at the NewPipe repository. Unfortunately, this PR makes breaking changes to the API and we'd require changes at NewPipe before this can be merged.
I don't think there's anyone working on those changes currently, so this PR is likely not going to be merged soon unless someone makes the necessary changes on the app side. This unfortunately also means other projects (Piped, Libretube, etc) have to wait as well.
I don't want to add noise, and contribute nothing useful here, but, there's been no activity for about a month and a half. I hope you haven't forgotten about this? It seems good to go?
I rebased. Most of the conflicts were caused by the playlist description PR and were thus easy to solve. Feel free to copy from my repo: https://github.com/TobiGr/NewPipeExtractor/tree/multiple-images-support