YouTube-Agent.bundle icon indicating copy to clipboard operation
YouTube-Agent.bundle copied to clipboard

Agent fails to download content when there is a content warning

Open ankenyr opened this issue 7 years ago • 6 comments

So for a video like the following https://www.youtube.com/watch?ajax=1&v=G0Qz-ZCwbaQ The agent is unable to get this information. It would be nice if there was a mechanism to do this. I imagine you would need to provide a username/password though this might cause problems if a person is using two factor. I don't know if you know a better way to approach this @sander1. Is there a secure way for plex agents to store passwords? I imagine I could just make a one off account used only for requesting stuff for the metadata.

ankenyr avatar Jun 05 '17 06:06 ankenyr

Hi @sander1 I finally took a look into this and I believe I have a solution but I am running into a bit of a problem. What modified things a bit on the original lines but they can easily be changed back. The idea is if I see there is a key 'content' in the json we pass. If not then we check if there is 'verify_age' in the value for key 'location'. If so we will use urllib2Request to download it again and beautifulsoup to grab the correct information. This works well enough in standard python but it fails inside of the plugin with a certificate error. Pay no mind to the fact that I have shitty if elif statements. I will fix those up and this was mostly for testing as I was debugging stuff.

 req = HTTP.Request(YOUTUBE_VIDEO_DETAILS % metadata.id)
                        json = req.content[4:]
                        parsed_json = JSON.ObjectFromString(json)
                        Log(parsed_json)
                        if parsed_json['content']:
                            pass
                        elif 'verify_age' in parsed_json['location']:
                            headers = {"User-agent" : "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/530.1+ (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1", "Accept-encoding" : "gzip"}
                            req = urllib2.Request("YOUTUBE_VIDEO_DETAILS % metadata.id", None, headers)
                            f = urllib2.urlopen(req)
                            resp = f.read()
                            soup = BeautifulSoup(resp)
                            Log(soup)
                        json_obj = parsed_json['content']
                        Log('JSON_OBJ')
                        Log(json_obj)

error message

2017-11-05 10:01:22,581 (-c8f84c0) :  CRITICAL (agentkit:1078) - Exception in the update function of agent named 'YouTube', called with guid 'com.plexapp.agents.youtube://uKZOBcZVMD0?lang=xn' (most recent call last):
  File "/volume1/@appstore/Plex Media Server/Resources/Plug-ins-1bf240a65/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1076, in _update
    agent.update(obj, media, lang, **kwargs)
  File "/volume1/Plex/Library/Application Support/Plex Media Server/Plug-ins/YouTube-Agent.bundle/Contents/Code/__init__.py", line 56, in update
    f = urllib2.urlopen(req)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 1241, in https_open
    context=self._context)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

I replicated how HTTP.py in the plex framework does the request. I am confused why my code gets this error. The ssl.py library in plex is exactly the same as the one on my synology server. I am thinking that Plex does something possibly that is preventing me from using urllib2 directly. Since you work for Plex it seems is there any insight you could provide?

ankenyr avatar Nov 05 '17 18:11 ankenyr

The problem is that even if you write code to follow the verify_age?next=... URL, it does not lead you to useful data. It only gives you other JSON formatted data with a note that you have to login to watch the video.

sander1 avatar Nov 05 '17 21:11 sander1

It isn't about following the url, the page itself has the data. You could forego useing HTTP.request entirely to get the information required as done below.

import urllib2
from BeautifulSoup import BeautifulSoup
req = urllib2.Request("https://www.youtube.com/watch?v=seRxE3b6m_w")
 f = urllib2.urlopen(req)
resp = f.read()
soup = BeautifulSoup(resp)
 print soup.find("strong", {"class": "watch-time-text"}).text
Published on Aug 12, 2009
print soup.find("div", {"class": "yt-user-info"}).text
vlogbrothers
print soup.find("div", {"id": "watch-description-text"}).text
OK...this is only a little bit embarrassing.It's a song about vegetables that look like penises...hopefully it won't get flagged I mean...they're just vegetables!HERE ARE A LOT OF LINKS TO NERDFIGHTASTIC THINGS:Shirts and Stuff:http://dftba.com/artist/30/VlogbrothersHank's Music:http://dftba.com/artist/15/Hank-GreenJohn's Books:http://amzn.to/j3LYqo======================Hank's Twitter:http://www.twitter.com/hankgreenHank's Facebook:http://www.facebook.com/hankimonHank's tumblr:http://edwardspoonhands.tumblr.comJohn's Twitter:http://www.twitter.com/realjohngreenJohn's Facebook:http://www.facebook.com/johngreenfansJohn's tumblr:http://fishingboatproceeds.tumblr.com======================Other ChannelsCrash Course:http://www.youtube.com/crashcourseSciShow:http://www.youtube.com/scishowGaming:http://www.youtube.com/hankgamesVidCon:http://www.youtube.com/vidconHank's Channel:http://www.youtube.com/hankschannelTruth or Fail:http://www.youtube.com/truthorfail======================Nerdfighteriahttp://effyeahnerdfighters.com/http://effyeahnerdfighters.com/nftumblrshttp://reddit.com/r/nerdfightershttp://nerdfighteria.info/A Bunny(\(\( - -)((') (')

I imagine there is a reason to use HTTP.py as it is doing a lot of other stuff but in this case it doesn't return everything we could use. If you think it makes more sense to modify HTTP.py I am happy to do modifications there and send a PR but I figured this would be simpler but that damn SSL error is stumping me. Let me know what you think @sander1

ankenyr avatar Nov 06 '17 00:11 ankenyr

Any thoughts on that SSL error @sander1?

ankenyr avatar Nov 08 '17 03:11 ankenyr

I kept running into SSL issues too, and I haven't been able to "fix" it within the Plex framework (I'm not the dev of the framework btw).

For now I use a dirty workaround that does not validate the SSL certificates by adding:

import ssl, urllib2

Then use a function like this:

def GetData(url):

    req = urllib2.Request(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"})
    ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
    data = urllib2.urlopen(req, context=ssl_context).read()

    return data

Example in the HGTV Canada URL Service.

sander1 avatar Nov 10 '17 15:11 sander1

I assume you would not want something that is not validating certs checked in to the repo. Should we open up a issue on the framework about this? Seems like this is a valid issue. I really appreciate your help btw!

ankenyr avatar Nov 10 '17 19:11 ankenyr