Piped icon indicating copy to clipboard operation
Piped copied to clipboard

How to update feeds in internal-only scenario

Open leepfrog-ger opened this issue 3 years ago • 22 comments

Thanks for this great piece of software! I am currently experimenting with it to see if it fits my use cases.

For my scenario it will be only available to a predefined set of whitelisted IPs, however after adding some subscriptions I noticed external requests trying to hit the /webhooks/pubsub?hub.topic endpoint. Those do not work because the requesting IP address is not on the whitelist.

You mention here that for subscriptions to update the instance needs to be accessible from the internet. Is there any other way currently possible or planned?

In my case I do not want to make the istance available for access from the public internet, on the other hand this is only a small local instance, so updating this manually (e.g. through a cronjob) would not be an issue to a low number of users/subscriptions.

leepfrog-ger avatar Jan 01 '22 23:01 leepfrog-ger

I agree completely about not using pubsub, I would much rather host this entirely internally myself as well.

All I have done at the moment is use ACLs on my top end nginx for the subdomains.

sdfg2 avatar Jan 21 '22 19:01 sdfg2

Have a similar setup and ran across this issue as well. In my case I can't make my api instance publicly available and access it from my internal network, since my ISP disables NAT loopback. Just another scenario which would benefit from this feature greatly.

mamba2410 avatar Aug 02 '22 11:08 mamba2410

For those facing a similar issue, it is possible to update feeds using the following cronjob (until the feature gets added to piped):

Fetching a channel content also updates the videos in the feeds.

#!/bin/bash

# In my case, the postgres database is hosted on 172.20.0.2, the database used by Piped is named "piped" and the
#    postgresql user with r-o access to the public.pubsub table is named "piped-refresher"
subscriptions=$(psql -h '172.20.0.2' -U piped-refresher -d piped -qtAX -c 'select id from public.pubsub;')

while IFS= read -r line; do
        # You can drop the -k flag if HTTPS is configured with a proper certificate.
        # Replace backend.piped.localnet with the backend of your instance
        curl -k "https://backend.piped.localnet/channel/$line" > /dev/null
        # Added to avoid potential issues with rate limiting. 
        # Since this workaround is intended for instances not publicly available, it is
        #    assumed that they have a sufficiently small quantity of subscribed channels
        #    so a 10s delay between each channel update is not a big issue.
        sleep 10; 
done <<< $subscriptions

This script assumes that either:

  • The database user does not have a password
  • The password is stored in a .pgpass file in the user directory

Then you can add the following line (or similar) to your cron table (in this case, refresh the feeds every 3 hours) :

0 */3 * * * /home/user/cronjobs/refresh-piped.sh >/dev/null 2>&1

GitKepler avatar Sep 11 '22 10:09 GitKepler

For those facing a similar issue, it is possible to update feeds using the following cronjob (until the feature gets added to piped):

Fetching a channel content also updates the videos in the feeds.

#!/bin/bash

# In my case, the postgres database is hosted on 172.20.0.2, the database used by Piped is named "piped" and the
#    postgresql user with r-o access to the public.pubsub table is named "piped-refresher"
subscriptions=$(psql -h '172.20.0.2' -U piped-refresher -d piped -qtAX -c 'select id from public.pubsub;')

while IFS= read -r line; do
        # You can drop the -k flag if HTTPS is configured with a proper certificate.
        # Replace backend.piped.localnet with the backend of your instance
        curl -k "https://backend.piped.localnet/channel/$line" > /dev/null
        # Added to avoid potential issues with rate limiting. 
        # Since this workaround is intended for instances not publicly available, it is
        #    assumed that they have a sufficiently small quantity of subscribed channels
        #    so a 10s delay between each channel update is not a big issue.
        sleep 10; 
done <<< $subscriptions

This script assumes that either:

  • The database user does not have a password
  • The password is stored in a .pgpass file in the user directory

Then you can add the following line (or similar) to your cron table (in this case, refresh the feeds every 3 hours) :

0 */3 * * * /home/user/cronjobs/refresh-piped.sh >/dev/null 2>&1

As discussed in #1666, this no longer works for Shorts and Livestreams as of the introduction of channel tabs - Neither does refreshing those tabs either.

hugoghx avatar Nov 02 '22 20:11 hugoghx

Hi everyone, with #1713 partially implemented, instances should now be receiving new videos from the federated protocol. Let me know if this helps with new videos

(You'll have to update (if it hasn't automatically done so) and restart your instance once)

FireMasterK avatar Nov 14 '22 00:11 FireMasterK

@FireMasterK I might have misunderstood the purpose of this, however I tried spinning that up today and my feed does not update at all. I see in piped-backend that I'm receiving events but there's no updates at all in the feed. I see that if I go to particular channels themselves, it is up-to-date, but then again, this was already the case before this update.

hugoghx avatar Nov 14 '22 15:11 hugoghx

Thought I should add some clarity here and maybe some direction on how this can be improved.

As I mentioned in #1666, a solution like this worked since I started using Piped (last year). This broke for livestreams and shorts (but not regular videos) with the introduction of channel tabs.

From what I could gather, this worked before because loading a channel page also refreshed the subscription feed with that channel's recent videos - This is not the case with the shorts and livestreams tabs.

To be clear - at least for me - the problem isn't that the data isn't there. The problem is that the subscription feed isn't being updated with the new data that is already present.

I think the "fix" here (considering that this, at least to my knowledge, isn't a supported way of getting subscriptions) would be to make the tabs' API handler(s) also refresh the feed subscriptions with the data they return, like the channel handler does - That way we can keep running the above scripts just with a few modifications to make a few extra requests to the tabs endpoints.

hugoghx avatar Nov 19 '22 20:11 hugoghx

An update on this: A while ago I identified some code that dealt with updating the subscription feed whenever you opened a channel's page (before channel tabs were introduced).

I added this code to the code that handles the channel requests after the channel tab changes. You can see the patch for this hack here: feed-update.patch

With this, you can now make requests to each channel you're subbed to and the sub feed will update. Be mindful that now you need to make 3 requests for each channel you're subbed to (one for each tab: Videos, Livestreams and Shorts).

I've been using it for months and this works great in terms of updating your subscription feed.

If you want to use this for your own instance, you need to be mindful of a few things:

  1. You'll need to open and maintain your own branch off Piped-Backend master with these changes.
  2. You'll need to build your own Piped-Backend and deploy it on your setup.
  3. Unfortunately, as of recently, this started incurring in very high CPU usage on Piped-Backend. I'm unsure why. My Java is very rusty and I don't really have a dev env for it either 😄.

====

With the introduction of the backend publishing its pubsub onto a matrix room for the other instances to pick up, I also tried going on the official instance and subscribing to all the channels I have in my instance, hoping the official instance would pass-through those updates to my private instance through the matrix room. I'm currently testing this, but unfortunately I've already noticed that the official instance seems to be missing some videos/livestreams (VS the method above), so this isn't the solution either sadly.

We're so close on this, yet so far. The video/livestream/short data is already present (and has always been ever since I started using Piped), even in our private instances -- The only thing is updating the sub feeds!

====

Edit: It's also on my list to test if I can just make the requests to the channels on latest Piped-Backend master (without this hack) and have it work. I'll update this when I get to it. Update: I don't think it works.

hugoghx avatar Aug 27 '23 11:08 hugoghx

@hugoamvieira you don't need to patch apply such a patch to the backend any more, I've added support for channel video updates when a channel tab is loaded in https://github.com/TeamPiped/Piped-Backend/commit/635688b5b70e0ce7bdc9efd32a1507a733f38331. It won't work on the shorts tab, though, as the uploaded date is missing, and fetching each video's uploaded date is quite expensive on the server's CPU.

FireMasterK avatar Aug 30 '23 14:08 FireMasterK

@FireMasterK This is awesome, thanks a lot! Now I understand why my CPU usage is so high 😄

hugoghx avatar Aug 30 '23 19:08 hugoghx

BTW @FireMasterK I'm now seeing this error a bunch of times on the backend side (version fa5bf89)

java.lang.NullPointerException: Cannot invoke "com.grack.nanojson.JsonObject.getString(String, String)" because "this.playerMicroFormatRenderer" is null
        at org.schabi.newpipe.extractor.services.youtube.extractors.YoutubeStreamExtractor.getTextualUploadDate(YoutubeStreamExtractor.java:192)
        at org.schabi.newpipe.extractor.services.youtube.extractors.YoutubeStreamExtractor.getUploadDate(YoutubeStreamExtractor.java:257)
        at me.kavin.piped.utils.VideoHelpers.handleNewVideo(VideoHelpers.java:69)
        at me.kavin.piped.utils.VideoHelpers.handleNewVideo(VideoHelpers.java:27)
        at me.kavin.piped.utils.matrix.SyncRunner.lambda$run$3(SyncRunner.java:177)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

Followed by Unknown event type: video.piped.stream.bypass.request

I thought this was due to my code requesting shorts, but I've now disabled that and it's still doing this

PS: It's not updating any type of content either. I suspect the error is why

hugoghx avatar Aug 30 '23 19:08 hugoghx

BTW @FireMasterK I'm now seeing this error a bunch of times on the backend side (version fa5bf89)

java.lang.NullPointerException: Cannot invoke "com.grack.nanojson.JsonObject.getString(String, String)" because "this.playerMicroFormatRenderer" is null
        at org.schabi.newpipe.extractor.services.youtube.extractors.YoutubeStreamExtractor.getTextualUploadDate(YoutubeStreamExtractor.java:192)
        at org.schabi.newpipe.extractor.services.youtube.extractors.YoutubeStreamExtractor.getUploadDate(YoutubeStreamExtractor.java:257)
        at me.kavin.piped.utils.VideoHelpers.handleNewVideo(VideoHelpers.java:69)
        at me.kavin.piped.utils.VideoHelpers.handleNewVideo(VideoHelpers.java:27)
        at me.kavin.piped.utils.matrix.SyncRunner.lambda$run$3(SyncRunner.java:177)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

I've fixed this in https://github.com/TeamPiped/Piped-Backend/commit/1b41b752436c71a02198519319e5275bb5528076.

Followed by Unknown event type: video.piped.stream.bypass.request

This is safe to ignore, it's a bit of bad logging logic, and is in error.

PS: It's not updating any type of content either. I suspect the error is why

Yes indeed, this should all work fine now tho :)

FireMasterK avatar Aug 30 '23 22:08 FireMasterK

Yes indeed, this should all work fine now tho :)

Using Piped-Backend master version 2b8b4aa983c1e52e64ce26bec14e085ce38d8462. I'm not seeing any updates come through on the sub-feed, at least for livestreams.

The livestream is up, it's even present on the channel page on the frontend, but not showing in the sub feed. I tried even manually loading the channel page's tabs, but that did not update the feed either.

Edit: Switching back to my version (based off of master 015fc06681f560c96d94ecae54d54630b4c008a5 + the patch above), it starts showing up. Unsure what the diff is

hugoghx avatar Aug 31 '23 13:08 hugoghx

Do we still need private instances to be accessible from the internet? Because I'm seeing missing videos from time to time in my feed

vayan avatar Sep 13 '23 03:09 vayan

Do we still need private instances to be accessible from the internet? Because I'm seeing missing videos from time to time in my feed

Last time I tried this, the changes didn't work for me. Looking at the commits, I don't think any more progress has been done on this.

hugoghx avatar Sep 13 '23 15:09 hugoghx

hello, i'm having still this problem which makes using piped very unconvinient, feed doesn't update, it adds newer videos if i click on the channel page but this isn't good as i have hundreds of subscriptions, is there a way it updates the videos? or has been some update going for this? i'm running piped docker with nginx

virustesztalt avatar Oct 06 '23 20:10 virustesztalt

I've adapted the previous script for docker and it runs well in a crontab every 30min or so.

#!/bin/bash

subscriptions=$(docker exec -i postgres psql -U piped -d piped -qtAX -c 'select id from public.pubsub;')

while IFS= read -r line; do
        curl -k "https://[yourapi]/channel/$line" > /dev/null
        sleep 1; 
done <<< $subscriptions

vayan avatar Oct 07 '23 03:10 vayan

I didn't have any luck with it, sadly. I noticed a lot of content didn't load.

#!/bin/bash

subscriptions=$(docker exec -i postgres psql -U piped -d piped -qtAX -c 'select id from public.pubsub;')

while IFS= read -r line; do
        curl -k "https://[yourapi]/channel/$line" > /dev/null
        sleep 1; 
done <<< $subscriptions

I don't think this will load livestreams or shorts by the way. At least it doesn't for me

hugoghx avatar Oct 07 '23 11:10 hugoghx

Thanks Guys! i applied the code and it works realy well shorts and livestreams aren't high on my priorities so this is actualy pretty great i also modified it a little bit: 0.01 sleep time it's fine for my usecase
:heart: :heart: :kissing_heart: :kissing_cat: :heart: :heart:

virustesztalt avatar Oct 07 '23 15:10 virustesztalt

I'm having problems that I added 246 channels that I follow on Youtube on my FEED, however, channel updates do not occur almost often, take too long to load new content, sometimes more than 24 hours, what should I do to indicate that the FEED is updated for example every 10 minutes or 5 minutes or something similar, i am a layman i did not see in the preferences where i should adjust this function to reduce the FEED update time, someone can tell me where to do it and what to do?

Mkcein avatar Oct 08 '23 21:10 Mkcein

I'm having problems that I added 246 channels that I follow on Youtube on my FEED, however, channel updates do not occur almost often, take too long to load new content, sometimes more than 24 hours, what should I do to indicate that the FEED is updated for example every 10 minutes or 5 minutes or something similar, i am a layman i did not see in the preferences where i should adjust this function to reduce the FEED update time, someone can tell me where to do it and what to do? heyyy yeah i had this issue, i could resolve this by using the script @vayan reccomended but for this you need to own the piped instance, since this needs to be put on the hosting server where it's piped there's not an option yet to specify the update time, this is very network usage intensive opperation ( i have about 350 subscriptions so it makes over 700 request to my dns server)

virustesztalt avatar Oct 09 '23 19:10 virustesztalt

To make it also update livestreams you have to query also the respective tab.

I've been running the following script in a container that can access the database via network; requires bash, curl, jq and psql (PostgreSQL client).

Requires environment variables:

  • BACKEND_URL: piped-backend URL
  • DBHOST, DBNAME, DBUSER, DBPASS: DB hostname and credentials

Usage: ./script [videos|livestreams].

#!/bin/bash

mode=${1:-videos}
backend=${BACKEND_URL:-http://localhost}

if [[ ! $mode =~ videos|streams ]]; then
    echo "Unrecognized operation mode $mode" >&2
    exit 1
fi

if ! (
    set -e
    touch ~/.pgpass
    chmod 0600 ~/.pgpass
    printf '%s:*:%s:%s:%s\n' "$DBHOST" "$DBNAME" "$DBUSER" "$DBPASS" > ~/.pgpass
)
then
    echo "Failed to create ~/.pgpass..." >&2
    exit 2
fi

if ! subscriptions=$(psql -h "$DBHOST" -U "$DBUSER" -d "$DBNAME" -qtAX -c 'select id from public.pubsub;')
then
    echo "Failed to get subscriptions from DB" >&2
    exit 3
fi

i=1
failures=0
total_subs=$(wc -l <<<"$subscriptions")

while IFS= read -r channel; do

    printf '[+] %4d/%d %s %s\n' $i "$total_subs" "$mode" "$channel"

    if ! (
        set -e

        if [ "$mode" = videos ]; then
            url="$backend/channel/$channel"
        else
            url=$(jq -nr --arg channel "$channel" --arg backend "$backend" '
            {
                originalUrl: "https://www.youtube.com/\($channel)/streams",
                url: "https://www.youtube.com/\($channel)/streams",
                id: $channel,
                contentFilters: ["livestreams"],
                sortFilter: "",
                baseUrl: "https://www.youtube.com"
            }
            | tojson | @uri | $backend + "/channels/tabs?data=" + .')
        fi

        curl -sSk "$url" >/dev/null
    )
    then
        ((failures++))
    fi

    if [ $i -ne "$total_subs" ]; then
        sleep $((RANDOM % 10 + 1))
        ((i++))
    fi

done <<<"$subscriptions"

if [ $failures -ne 0 ]; then
    echo "[!] Failed $failures time(s)" >&2
    exit 4
fi

If you want to run it outside of a container you can use something like podman exec to run psql. Remove the block that writes to .pgpass and get the subscriptions with (replace piped-postgres with the name of your DB container):

if ! subscriptions=$(podman exec -i piped-postgres \
    psql -U piped -d piped -qtAX -c 'select id from public.pubsub;')
then
    echo "Failed to get subscriptions from DB" >&2
    exit 2
fi

I've added a randomized sleep to prevent overwhelming the server, which you can change according to your needs (e.g. number of subscriptions).

acidghost avatar Feb 09 '24 15:02 acidghost