Yotter icon indicating copy to clipboard operation
Yotter copied to clipboard

improve yt proxying

Open lowne opened this issue 3 years ago • 16 comments

  • consolidate proxying logic in routes.py
  • can proxy images and videos internally
  • flexible user configuration of external proxy
  • (don't use json for human-facing configuration)

Note: I haven't tested nginx, but unless it chokes on a ?&arg= query it should work.

lowne avatar Oct 19 '20 15:10 lowne

Awesome thanks! I will have to review it as this would only work with a non-nginx set-up. I will be adding the Nginx part to this!

pluja avatar Oct 19 '20 17:10 pluja

...of course I used the wrong branch 🤦

lowne avatar Oct 19 '20 18:10 lowne

@pluja after some testing with nginx, it does work fine with ?&host=, so the suggested default in yotter-config.sample.yaml of

external_proxy: "https://my.nginx.instance{path}?{query}&host={netloc}"

should indeed work just fine with the nginx config from https://paste.ubuntu.com/p/Bzd9SRCJSG/ (as linked from SELF-HOSTING.md).

However, as per #53, this more elegant nginx config works as well:

  location /proxy/ {
    proxy_buffering off;
    resolver 1.1.1.1;
    rewrite /proxy/(https?:/)([^/]+)/?(.*) $1$2$3 break;
    #nginx "normalised" the scheme // to /, so need to restore one slash
    #also note that proxy_pass with a naked host (no path) breaks horribly, 
    #so the root / is excluded from captures and manually placed here
    proxy_pass $1/$2/$3;
    proxy_set_header Host $2;
    proxy_ssl_server_name on;
    sendfile on;
    tcp_nopush on;
    add_header Access-Control-Allow-Origin *;
  }

with this simpler yotter-config:

external_proxy: "https://my.nginx.instance/proxy/{url_encoded}"

lowne avatar Oct 20 '20 15:10 lowne

Any reason why you decided to URLEncode the URL instead of handling them how nginx currently does them? I thought a better approach would be to proxy them similar to how Invidious and YouTube work.

FireMasterK avatar Oct 20 '20 16:10 FireMasterK

@FireMasterK I'm not sure I understand your question, could you be more specific? I'll attempt a reply nonetheless:

URLEncode the URL

well, how else are you going to pass around URLs to webservers?

handling them how nginx currently does them?

The admin can:

  • set proxy_videos and proxy_images to false, omit external_proxy: the client connects directly to YT
  • set proxy_videos and proxy_images to true, omit external_proxy: the app (e.g. gunicorn) proxies requests to YT, no need for any reverse proxy (theoretically)
  • set proxy_videos and proxy_images to true, set external_proxy: requests to YT are proxied by whatever external proxy is used with whatever url scheme is set, including the current nginx scheme with "https://my.nginx.instance{path}?{query}&host={netloc}"

proxy them similar to how Invidious and YouTube work

again, could you clarify?

lowne avatar Oct 20 '20 17:10 lowne

@FireMasterK I'm not sure I understand your question, could you be more specific? I'll attempt a reply nonetheless:

URLEncode the URL

well, how else are you going to pass around URLs to webservers?

Take a look at this URL for example https://yotter.kavin.rocks/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s88-c-k-c0x00ffffff-no-rj-mo?host=yt3.ggpht.com

Yotter then proxies the URL from https://yt3.ggpht.com/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s88-c-k-c0x00ffffff-no-rj-mo?host=yt3.ggpht.com with the help of Nginx

handling them how nginx currently does them?

The admin can:

  • set proxy_videos and proxy_images to false, omit external_proxy: the client connects directly to YT
  • set proxy_videos and proxy_images to true, omit external_proxy: the app (e.g. gunicorn) proxies requests to YT, no need for any reverse proxy (theoretically)
  • set proxy_videos and proxy_images to true, set external_proxy: requests to YT are proxied by whatever external proxy is used with whatever url scheme is set, including the current nginx scheme with "https://my.nginx.instance{path}?{query}&host={netloc}"

I think that proxying should always be enabled, without a configuration option, but I'm open to other opinions too. My solution was to change the routes so that they perform just like how they do with Nginx, thereby removing the need for a configuration option altogether.

proxy them similar to how Invidious and YouTube work

again, could you clarify?

YouTube: https://yt3.ggpht.com/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj Invidious: https://invidious.kavin.rocks/ggpht/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj Yotter: https://yotter.kavin.rocks/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj?host=yt3.ggpht.com

Your PR seems to change the format of how it's currently done (by URL-encoding it).

FireMasterK avatar Oct 21 '20 13:10 FireMasterK

Your PR seems to change the format of how it's currently done (by URL-encoding it).

No. Paste this into your yotter-config.yaml:

proxy_videos: true
proxy_images: true
external_proxy: "https://my.nginx.instance{path}?{query}&host={netloc}"

and it'll behave exactly as before (with nginx and "nginxVideoStream":"true" in the json config).

The internal proxy routing can be whatever we want, as it never leaks. Instead of rolling-my-own-scheme-which-breaks-on-unforeseen-edge-cases, I simply used the <path:var> facility already provided by flask, which yes, happens to urlencode the parameter, because it's the simplest and sanest way to pass a full url around.

To see why, let's get back to the current nginx proxy url scheme: if tomorrow Google decides to change the endpoints to, say yt9000.google/abc123/AATX.., then you need to go and change your nginx.conf to serve /abc123/ as well. If the new endpoint is yt.over9000.org/AATX... (no top-level folder) then the gig is up. My suggestion above, using the full urlencoded url, prevents all of these problems.

And if your argument is really that none of this should be configurable by the admin (which I don't get, but to each their own), then this entire PR can simply be thrown away. 😄

I, for one, like to use caddyserver and/or haproxy, and wouldn't want to touch nginx with a ten foot pole.

lowne avatar Oct 21 '20 15:10 lowne

How nice! I'll be testing that out and merging if it's all ok!

pluja avatar Oct 23 '20 06:10 pluja

@FireMasterK would this be compatible with #122 ?

pluja avatar Oct 27 '20 09:10 pluja

I can't say for sure, considering this

To see why, let's get back to the current nginx proxy url scheme: if tomorrow Google decides to change the endpoints to, say yt9000.google/abc123/AATX.., then you need to go and change your nginx.conf to serve /abc123/ as well. If the new endpoint is yt.over9000.org/AATX... (no top-level folder) then the gig is up. My suggestion above, using the full urlencoded url, prevents all of these problems.

While I do get his point, the likelihood of YouTube doing something like changing the endpoints is pretty low, especially for the endpoints behind their CDN. This would also remove the ability to cache only a certain type of content, which I do on my instance.

So, in short, no it is not compatible with the current design.

FireMasterK avatar Oct 27 '20 10:10 FireMasterK

What changes should we make so we can make this two approaches compatible?

pluja avatar Oct 27 '20 10:10 pluja

Either,

Maintain the same old routes: /vi/ /a/ /videoplayback etc

Or modify the go app to use the new route: /proxy/

I'm against the new route as it makes it harder to understand what's happening under the hood and makes the URLs illegible to a human eye

FireMasterK avatar Oct 27 '20 10:10 FireMasterK

maybe we could use the /proxy/ when it's running locally, and use all others when is a server-hosted instance with nginx.

pluja avatar Oct 27 '20 11:10 pluja

Yes, that should be fine technically

FireMasterK avatar Oct 28 '20 03:10 FireMasterK

We will need to review all of this as some changes now produce merge conflicts.

I would really like this to work as I want for Yotter to be able to be executed locally without the need of a server. I think that the best way to approach this is to have a setting where we can choose if we are on an instance or running locally.

If we are running locally we will need to change the endpoints to the Flask routes for the proxying.

Another good option is to use all the youtube links without proxy at all, and recommend users who are running this locally to just run with a VPN. In the end, they will be connecting to google even if we proxy. It doesn't really make sense to proxy when running locally.

pluja avatar Oct 30 '20 15:10 pluja

looking forward to this :)

toh995 avatar Dec 28 '20 03:12 toh995