PeerTube icon indicating copy to clipboard operation
PeerTube copied to clipboard

Automatically import videos from other platforms

Open roipoussiere opened this issue 6 years ago • 32 comments

In order to simplify the PeerTube adoption from current Youtubers, it's important to simplify the importing process.

The import script is a good step in this direction, but it is possible to do better: automatically import Youtube videos, so the PeerTube import can be totally transparent for the Youtuber.

We can do this without the Youtube API, by using atom feeds: https://www.youtube.com/feeds/videos.xml?channel_id=<channel_id>.

I'm not used with TypeScript but I wrote a small Python script as a proof-of-concept example, available here.

Used with the import script and a 5min-cronjob, a PeerTube user could feed its instance by only specifying its Youtube channel name.

Note that we also need a way to check if the video is not already on the PeerTube instance, can we provide the youtube video id in some video metadata?

roipoussiere avatar Jun 29 '18 16:06 roipoussiere

Since it's a auto import, we can assume videos previously imported have the same title as on YT. We could just check against titles for a first implementation.

rigelk avatar Jun 29 '18 20:06 rigelk

Maybe for a POC, but speaking as someone who does these sort of automated import ops at my day job (I work for a large YouTube channel) I can say that titles can and are modified after publishing, especially within the first hour after publishing. Can't we store the YouTube video ID in some form of metadata entry on the PeerTube video and look it up to see if we've already imported it?

rezonant avatar Jun 29 '18 20:06 rezonant

Ah, right :/

I guess it's not a problem to add this metadata, since it's something that doesn't need to be federated (meaning we don't break things if we add a field to the video model).

My concern is more as to how to make that metadata structure broad enough to be reused for imports from the other platforms supported by youtube-dl. I guess a HashMap is fine there.

rigelk avatar Jun 29 '18 20:06 rigelk

I can say that titles can and are modified after publishing

fyi Youtube feed also provides an updated field, so I guess if the video is renamed, it will appears in the feed for the second time, with the same published date (and youtube id) but an other updated date.


My concern is more as to how to make that metadata structure broad enough to be reused for imports from the other platforms supported by youtube-dl. I guess a HashMap is fine there.

But not all platforms supported by youtube-dl provides an Atom/RSS feed.

Note that a unique item id is mandatory for Atom feed:

atom:entry elements MUST contain exactly one atom:id element.

... but is optional for RSS feed:

<guid> is an optional sub-element of . guid stands for globally unique identifier. It's a string that uniquely identifies the item.

Related: The RSS bridge project provides Atom/RSS for many websites, including video providers.


In fact we could simply use the video URL as a unique identifier, it is supported by all platforms and supposed to be unique.

roipoussiere avatar Jun 29 '18 22:06 roipoussiere

URL is an elegant way to do it but canonization is a factor. Though you wouldn't run into it with auto YT import you may run into it in other components. An example is the 4-5 different ways to express a YouTube URL (YouTube.be, mobile.youtube.com/watch, gaming.youtube.com/watch, music.youtube.com/watch, YouTube.com/watch). For this reason I would recommend just having a provider and ID pair, or just "provider:providerID" format. Use .split(':', 2) to separate the two

rezonant avatar Jun 29 '18 22:06 rezonant

An example is the 4-5 different ways to express a YouTube URL (YouTube.be, mobile.youtube.com/watch, gaming.youtube.com/watch, music.youtube.com/watch, YouTube.com/watch)

Hmm yes, good point. provider and providerID could be fine.


Note that Vimeo and DailyMotion also officially provide feeds, where we can find video unique ids:

  • <feed> / <entry> / <yt:videoId> for Youtube (example);
  • <rss> / <channel> / <item> / <dm:id> for DailyMoton (example);
  • <rss> / <channel> / <item> / <guid> for Vimeo (example).

So we could easily implement auto-download also for these platforms.

roipoussiere avatar Jun 29 '18 23:06 roipoussiere

This is an important idea for boosting the amount of content on the Peertube network. Once included, I would go as far as to as to ask for new users channels elsewhere that they'd like to set up for auto-import, when they're registering.

That way, creators from other platforms who make an account on a PeerTube instance "just to try it out" will automatically get up and running quickly and the network gets a ton of ongoing new content, even if they forget about their PeerTube channel and don't touch it again.

Beyond the scope of this issue, but worth considering later, would be actually doing a full import of someone's channel (not just videos), ie their channel description, avatar etc.

Bugsbane avatar Sep 27 '18 14:09 Bugsbane

I don't think that checking against the video title/name would be good, because users can upload different videos having the same name. it's better to store the original url for the imported video and check against the imported video url.

This is how it's done for the frontend when a user imports a video, but for the video-import script it doesn't set the original video url that's imported(targetUrl). So really it is a bug in the ./server/tools/peertube-import-videos.ts because it's not performing the same steps as the video import from the frontend, it also doesn't create entries in the videoImport table when using the script, but it does do it when done by a user from the frontend.

screen shot 2018-10-10 at 12 02 51 pm

McFlat avatar Oct 10 '18 17:10 McFlat

Next I want to work on the videos import script to be like the frontend one because, the difference really affects me in a big way. I need it to check the targetUrl instead of a name/title of the video to keep out any duplicates.

McFlat avatar Oct 10 '18 21:10 McFlat

I think using a generic target url, converted from the many different possibilities would be best instead of doing provider:id, that way you can just use the url the way it is, instead of having to do a conversion all the time to fetch it

McFlat avatar Oct 10 '18 21:10 McFlat

(Maybe out of scope but) In the meantime I personnaly use @roipoussiere's Python script with small improvement and a cron job. Here is my code : https://taboulisme.com/git/nouts/peertube-import Though, it can only be set by the instance admin. I'm using it for duplicating some YouTube channels at https://alttube.fr/videos/local

gnouts avatar Jun 18 '19 09:06 gnouts

@gnouts I like your project but it would be more intuitive to have it on peertube anyway It would improve discovery and usage of such tool

aliceinwire avatar Jul 04 '19 06:07 aliceinwire

@gnouts I would like to report an issue (in node arguments, -l is for license whereas your intent is to specify language, so this should be -L or --language). Where may I report a bug? :)

fflorent avatar Jul 04 '19 06:07 fflorent

@fflorent Thanks, I set up a github clone here : https://github.com/gnouts/peertube-import @aliceinwire Sure, I agree. In the meantime it works for my usecase and actually I can't do better with my time and knowledge.

gnouts avatar Jul 04 '19 11:07 gnouts

While I remember having used this script to import a whole YT channel, I recently tried again and failed to run it for that purpose.

But the documentation seems to tell that we can pass the ID of a channel: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/tools.md#peertube-import-videosjs

I also see that youtube-dl supports downloading a whole channel. I wonder if that's possible to use a cron script to automatically synchronize a YT channel. I'll investigate that but if anyone can tell me more about it, I would be very grateful! :)

fflorent avatar Jul 24 '19 20:07 fflorent

OK, so I could run the peertube-import-video script in order to upload a whole YT channel to Peertube. Also I figured out that we cannot rely on video names in order to detect whether a video has already been uploaded (if the YT owner renames a video, that creates duplicates…). Rather than that, in order to synchronize a YT channel with a PT one, I propose to rely on a --since parameter and use a crontab job.

For that purpose, I opened this PR: https://github.com/Chocobozzz/PeerTube/pull/1991

Even if that's WIP, feedback welcome :).

Florent

fflorent avatar Jul 28 '19 21:07 fflorent

Aren't all YouTube video ID's unique? If so, couldn't the import just store the YouTube video ID and then check there aren't already any videos imported with that same ID?

Bugsbane avatar Aug 20 '19 19:08 Bugsbane

@Bugsbane Yes, it has already been suggested here: https://github.com/Chocobozzz/PeerTube/pull/1991#issuecomment-515936253

But it requires more efforts for implementing this.

If you are willing to contribute, please do so. I would personally appreciate this improvement :).

fflorent avatar Aug 20 '19 20:08 fflorent

If you are willing to contribute, please do so.

I'm more than happy to contribute... however my personal skills though lie in design. UX, communications and marketing rather than coding. :P

Bugsbane avatar Sep 20 '19 01:09 Bugsbane

I actually found this issue (and multiple others) after i had almost completed a Python tool to do this, so i figured i would comment about it here.

https://github.com/mister-monster/YouTube2PeerTube

This is a tool that watches YouTube channels, and when new videos are found it mirrors them to a PeerTube channel.

mister-monster avatar Oct 18 '19 23:10 mister-monster

@mister-monster Don't hesitate to make a MR to add your script in the documentation website: https://docs.joinpeertube.org/#/use-third-party-application

Chocobozzz avatar Oct 19 '19 11:10 Chocobozzz

What I wrote on Twitch at #4713 :

Describe the solution you would like:

An ability to connect PeerTube with a Twitch Account, and to enable automatic imports of VODs, Clips and/or Highlights.

The user should be able to decide which kind of video should be uploaded to which Channel. There should be a default set of tags which can be changed later if need be. One Tag should be the Twitch Category, another could be the Streamer Name, and "Twitch" itself would seem appropriate, too.

Description can be something basic like:

"Source: https;//twitch,tv/videoID"

For Clips:

"[Clipped by TwitchUser12](https;//clips,twitch,tv/ClipID)"

There should be a setting to decide from what date onward videos should be imported, to avoid duplicate uploads.

There should also be an option to have a delay between making the VOD/Highlight, and the automated Import, to give the Streamer some time to make a proper description/thumbnail.

0lhi avatar Jan 13 '22 15:01 0lhi

UI/UX mockups, feel free to comment:

Channel management

Information tab

We add a tab for managing the channel information:

new information tab

Synchronization tab

Filling the frequency and the URL

Frequency and URL

With CRON custom value

Capture d’écran 2022-03-09 à 16 05 45

After fetching list of videos

After fetching videos from the external platform

Capture d’écran 2022-03-09 à 16 27 44-fullpage

Administration: Allowing the feature

2022-03-09_16-33

Others

  • If the user is not granted for uploading videos (because their quota is set to 0) or the admin did not allow the synchronization, the tab elements should be disabled with a message indicating the reason.
  • [EDIT 1] If the import failed, an email should be sent to the owner of the channel, inviting to reimport manually the video (or retry importing using the video URL) and no retry should be done anymore.

fflorent avatar Mar 09 '22 15:03 fflorent

Hi,

Thanks a lot for these mockups!

Some remarks:

  • I think we should display an "import sync" recap in the "My importations page", above the current table. It would display already configured import syncs (URL, channel destination). A button would allow to delete an already configured sync, and another one to create an import synchronization. We can also add another link in the channel page or the channels list page to create an import synchronization
  • I don't think we should allow users to choose sync frequency, but instead provide it as an admin configuration
  • The cron setting is too complex for users
  • I'm not sure we should implement the "preview list of videos". It would require additional REST API route, tests etc so I don't think it's worth it

Chocobozzz avatar Mar 11 '22 12:03 Chocobozzz

@Chocobozzz Thanks a lot for your feedback.

As your feedback are either trivial to visualize or remove some elements of the mockups, I'd start implementing the feature, unless you suggest to rework my mockups.

fflorent avatar Mar 11 '22 12:03 fflorent

A little update of the work in progress.

I pushed a version which:

  • let the owner of a channel fill the external channel to sync with;
  • disallow through the UI the owner of a channel to fill the external channel field and explain why. 2 possible reasons: the admin disallows the HTTP video import or the user is granted a quota of 0 (so s/he is not allowed to upload anything);
  • succeeds to fetch regularly (every 15 minutes) the last 3 videos of an external channel;
  • fetches in the database the list of the video imports to filter the videos already imported among the last 3 videos;
  • [technical considerations] the API don't check whether the user is allowed to sync channels (according to the criteria of the 2nd point of this list), instead the check is made while the video-channels-sync is run (we filter directly the user having a quota ≠ 0 through the sequelize request);
  • Feedback welcome: I assume the HTTP video import is enough to tell whether the synchronization is allowed or not for users who can publish videos, so I finally chose to not implement the option in this mockup (unless you express the need of it);

You should be able to give this commit a try: https://github.com/fflorent/PeerTube/commit/ce36e4368b232bbfa99d062e7c6de7bca0372d85

Next steps:

  • [x] Finish the video-channels-sync:
    • [x] Add more logs
    • [x] Continue the synchronization (the video-channels-sync job) even when the synchronization of a channel fails;
    • [x] Continue the synchronization of a channel even when the fetch of a video fails;
    • [x] Reverse the playlist fetched by youtube-dl to conform with the chronological order;
  • [x] Check it works also with these platforms: Vimeo and Dailymotion;
  • [ ] Allow the admin to change the synchronization (video-channels-sync job) interval, and maybe change the default value: youtube-dl can be very slow;
  • [ ] Chocobozzz proposal: "I think we should display an "import sync" recap in the "My importations page", above the current table";
  • [ ] Implement unit/integration tests;

After this feature request, I would also propose these enhancements:

  • [ ] For a channel, specify a default license: this not only applies to channel synchronization, but also to any imported videos (user license choice for specific video > channel default license > instance default license);
  • [ ] Less frequently by default (like every 24h), check the meta-data of every videos of a synced channel and, when needed, update the captions, the descriptions, etc. (this would help a user of my instance who has captioned old videos on Youtube), the license will not be synchronized (because YT only give few choices for video licenses; I have not checked other platforms);

fflorent avatar Apr 11 '22 09:04 fflorent

Hi,

Just a small comment about the global schema: I think we should provide a dedicated form where you can import any channel URL (input text) into any PeerTube channel (select filled with your channels). This way, you may import multiple youtube channels into the same peertube channel (I'm sure we'll have this feature request in the future).

Chocobozzz avatar Apr 15 '22 12:04 Chocobozzz

Hi,

Just a small comment about the global schema: I think we should provide a dedicated form where you can import any channel URL (input text) into any PeerTube channel (select filled with your channels). This way, you may import multiple youtube channels into the same peertube channel (I'm sure we'll have this feature request in the future).

Thanks for your feedback!

I am not sure why users would import multiple YT channels into one PT channel, as they can create as many channels as they want with their account.

Though if you insist, I will conform to your idea based on your intuitions (as I don't have as many input from users as you and you will probably maintain this feature more than I will do).

fflorent avatar Apr 15 '22 13:04 fflorent

Though if you insist, I will conform to your idea based on your intuitions (as I don't have as many input from users as you and you will probably maintain this feature more than I will do).

You may want to create a channel on a specific theme, and automatically aggregate multiple remote channels in it. Since I don't think it will make the UI more complex and will be more difficult to implement in the backend, I think we should support this use case so we don't have to change the way it works in the future. Plus, we may also support playlist synchronisation (and so it's also a valid use case to sync multiple playlists in a specific channel).

Chocobozzz avatar Apr 15 '22 14:04 Chocobozzz

Do you mind explaining how duplication is avoided? Are Youtube video IDs stored locally as a peertube video metadata so that these are automatically ignored from subsequent syncs?

drzraf avatar Apr 20 '22 16:04 drzraf

Do you mind explaining how duplication is avoided?

There is a table named videoImport which stores all the url of the videos imported (thus we can use it to avoid duplications)

fflorent avatar Apr 20 '22 20:04 fflorent

Do you mind explaining how duplication is avoided?

There is a table named videoImport which stores all the url of the videos imported (thus we can use it to avoid duplications)

Are these normalized? Cause otherwise adding a video through the shortened YT link for example will create a duplicate when it's queued using the long URL again.

GlassedSilver avatar Apr 20 '22 20:04 GlassedSilver