youtube-dl icon indicating copy to clipboard operation
youtube-dl copied to clipboard

Fix RTP Play support as of Aug 2021

Open pferreir opened this issue 3 years ago • 6 comments

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • [x] I am the original author of this code and I am willing to release it under Unlicense
  • [ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • [x] Bug fix
  • [ ] Improvement
  • [ ] New extractor
  • [ ] New feature

This builds on #28205 by @vallovic and once again fixes support for RTP Play downloads. Please not that one of the URL which were previously used in the tests was expired, so I replaced it with a more recent one. However, I get:

youtube_dl.utils.DownloadError: ERROR: requested format not available

From what I saw, changing the format parameter in parameters.json to best/bestvideo+bestaudio solves it. That seems to be the default when the app is running too.

pferreir avatar Aug 19 '21 13:08 pferreir

I had a different approach to decoding the messy player config which might be of use (see https://github.com/ytdl-org/youtube-dl/issues/29458#issuecomment-874354702). This follows on from your l.59; earlier on I had _URICHARS = r"[-.!~*'()%#\w]"

        # Get JS objects:
        # 'var f = {...}; var g = new RTPPlayer({...});'
        js_match = re.search(
            (r'(?s)var\s+f\s*=\s*%s\s*;\s*var\s+\w+\s*=\s*new\s+RTPPlayer\s*\(\s*%s\s*\)\s*;'
             % (r'(?P<var_f>\{[^;]+\}|"[^;"]+")', r'(?P<player_config>\{[^;]+\})')),
            webpage)
        if not js_match:
            raise RegexNotFoundError('Unable to extract player config')

        js_objs = js_match.groupdict()
 
        # '"cod", "ed ", "URL", "...", ...' -> "http://..."
        def atob_decode(expr):
            codes = re.findall(r'(?<=")%s+(?=")' % self._URICHARS, expr, flags=re.S)
            return "'%s'" % compat_b64decode(compat_urllib_parse_unquote(''.join(codes)))

        for k, v in js_objs.items():
            # '... // comment\n' -> '... \n'
            decoded = re.sub(r'\s//[^\n]*\n', '\n', v)
            # 'file: f,' -> ''
            decoded = re.sub(r'\s*file\s*:\s*\w+\s*,\s*', '', decoded)
            # 'atob( decodeURIComponent(["aHR0cH", ...,"%3D%3D"].join("")))' 
            #  -> decoded URI
            decoded = re.sub(
                    (r'atob\s*\(\s*decodeURIComponent\s*\(\s*\[\s*((?:"%s+"\s*,\s*)*"%s+")\s*\]\s*\.\s*join\s*\(\s*""(?:\s*\)){3}'
                     % (self._URICHARS, self._URICHARS)),
                    lambda x: atob_decode(x.group(1)), decoded, flags=re.S)
            js_objs[k] = self._parse_json(decoded, video_id, js_to_json)

        formats = []
        config = {}
        for k, v in js_objs.items():
            if k == 'var_f':
                if isinstance(v, compat_str):
                    ext = determine_ext(v)
                    formats.append({
                       'url': v,
                       'ext': ext,
                    })
                else:
                    for fmt, m_url in v.items():
                        ext = determine_ext(m_url)
                        if ext == 'm3u8':
                            formats.extend(
                                self._extract_m3u8_formats(
                                    m_url, video_id, 'mp4', 'm3u8_native',
                                    m3u8_id=fmt))
                        elif ext == 'mpd':
                            formats.extend(
                                self._extract_mpd_formats(
                                    m_url, video_id, mpd_id=fmt))
            elif k == 'player_config':
                config = v
                m_url = v.get('fileKey')
                if m_url:
                    ext = determine_ext(m_url)
                    m_url = update_url_query(m_url, 
                            try_get(v, lambda x: x['extraSettings'], dict))
                    formats.append({
                       'url': m_url,
                       'ext': ext,
                    })

             self._sort_formats(formats)
            if config.get('mediaType') == 'audio':
                for f in formats:
                    f['vcodec'] = 'none'
 
            return {
                'title': title,
                'id': video_id,
                'formats': formats,
                'thumbnail': (config.get('poster')
                              or self._og_search_thumbnail(webpage)),
                'description': self._html_search_meta(('description', 'twitter:description'), webpage),
            }

dirkf avatar Aug 23 '21 11:08 dirkf

Using only DASH/MPD is not the best option regarding RTP Play because they have 4 or 5 different approaches in their code depending on the area - main RTP Play, RTP Arquivos, Zig Zag, Estudo em Casa, they are all different.

My previous extractor dealt with all except RTP Arquivos which is now fixed in my latest commit. I've also put up more tests so the code can be run against those different areas.

But I have to thank both @pferreir and @dirkf becauce I'm far from a pro in Python and I've used some of your code/comments to improve my own code.

vallovic avatar Aug 28 '21 20:08 vallovic

Tested this again, and it does fix this extractor. Any chance of rebasing this to the latest "master"?

somini avatar Mar 12 '23 23:03 somini

What's the status of this?

somini avatar Aug 01 '23 21:08 somini

The PR author didn't implement any review comments. Looks like it needs a QA review before merging.

dirkf avatar Aug 02 '23 14:08 dirkf

The blocking issue seems to have been merged.

somini avatar Mar 11 '24 23:03 somini