NewPipeExtractor icon indicating copy to clipboard operation
NewPipeExtractor copied to clipboard

Refactor link handlers

Open Stypox opened this issue 4 months ago • 1 comments

The interface of link handlers and link handler factories is strange, hard to debug, and it's doing something extremely simple (checking if a URL matches and extracting stuff from it) in a highly overcomplicated way. It could be refactored like this in my opinion:

  • have a base interface Link (the shorter the name the better, current names are huge)
  • The interface has only a few extractor related methods, e.g. getExtractor() returns the extractor corresponding to the link.
  • The interface also has a few app-facing public methods that are meant to be used only by the app (and not by other parts of the extractor). For example, a getUniqueId() method that returns a stable and unique ID for each resource, so it can be used as a primary key in NewPipe's database.
  • Each extractor has a corresponding Link implementation. Every Link's constructor takes a URL and builds an instance of Link, but throws an exception if the URL does not match the expected link format.
  • Each Link implementor may also have other constructors, e.g. YoutubeSearchLink would have a constructor that takes the query and any search filters.

Stypox avatar Aug 09 '25 08:08 Stypox

Should also mention, that we also need to refactor the entire extraction architecture in the general, or least in the case of Soundcloud.

Because for Soundcloud, there are many entry points into extraction that end up extracting data that has already been extracted. E.g. when playing a stream, we extract the stream, but then when getting comments we do another network request in SoundcloudCommentsLinkHandlerFactory to get the url and id for the comments for that stream, even though that data was already in memory at an earlier point. Idk how it works for other services.

Regardless, in the general case should refactor the code so that extraction isn't split up like it is: like we can have many entry points to extraction and that can get certain data about a resource (like a certain youtube video), and at any point we need to get related data about that resource, we should use the existing data we already have for that resource to get the rest.

absurdlylongusername avatar Aug 09 '25 09:08 absurdlylongusername