youtube-dl Fix RTP Play support as of Aug 2021

Please follow the guide below

You will be asked some questions, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your pull request (like that [x])
Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

[x] Searched the bugtracker for similar pull requests
[x] Read adding new extractor tutorial
[x] Read youtube-dl coding conventions and adjusted the code to meet them
[x] Covered the code with tests (note that PRs without tests will be REJECTED)
[x] Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

[x] I am the original author of this code and I am willing to release it under Unlicense
[ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

[x] Bug fix
[ ] Improvement
[ ] New extractor
[ ] New feature

This builds on #28205 by @vallovic and once again fixes support for RTP Play downloads. Please not that one of the URL which were previously used in the tests was expired, so I replaced it with a more recent one. However, I get:

youtube_dl.utils.DownloadError: ERROR: requested format not available

From what I saw, changing the format parameter in parameters.json to best/bestvideo+bestaudio solves it. That seems to be the default when the app is running too.

Aug 19 '21 13:08 pferreir

I had a different approach to decoding the messy player config which might be of use (see https://github.com/ytdl-org/youtube-dl/issues/29458#issuecomment-874354702). This follows on from your l.59; earlier on I had _URICHARS = r"[-.!~*'()%#\w]"

        # Get JS objects:
        # 'var f = {...}; var g = new RTPPlayer({...});'
        js_match = re.search(
            (r'(?s)var\s+f\s*=\s*%s\s*;\s*var\s+\w+\s*=\s*new\s+RTPPlayer\s*\(\s*%s\s*\)\s*;'
             % (r'(?P<var_f>\{[^;]+\}|"[^;"]+")', r'(?P<player_config>\{[^;]+\})')),
            webpage)
        if not js_match:
            raise RegexNotFoundError('Unable to extract player config')

        js_objs = js_match.groupdict()
 
        # '"cod", "ed ", "URL", "...", ...' -> "http://..."
        def atob_decode(expr):
            codes = re.findall(r'(?<=")%s+(?=")' % self._URICHARS, expr, flags=re.S)
            return "'%s'" % compat_b64decode(compat_urllib_parse_unquote(''.join(codes)))

        for k, v in js_objs.items():
            # '... // comment\n' -> '... \n'
            decoded = re.sub(r'\s//[^\n]*\n', '\n', v)
            # 'file: f,' -> ''
            decoded = re.sub(r'\s*file\s*:\s*\w+\s*,\s*', '', decoded)
            # 'atob( decodeURIComponent(["aHR0cH", ...,"%3D%3D"].join("")))' 
            #  -> decoded URI
            decoded = re.sub(
                    (r'atob\s*\(\s*decodeURIComponent\s*\(\s*\[\s*((?:"%s+"\s*,\s*)*"%s+")\s*\]\s*\.\s*join\s*\(\s*""(?:\s*\)){3}'
                     % (self._URICHARS, self._URICHARS)),
                    lambda x: atob_decode(x.group(1)), decoded, flags=re.S)
            js_objs[k] = self._parse_json(decoded, video_id, js_to_json)

        formats = []
        config = {}
        for k, v in js_objs.items():
            if k == 'var_f':
                if isinstance(v, compat_str):
                    ext = determine_ext(v)
                    formats.append({
                       'url': v,
                       'ext': ext,
                    })
                else:
                    for fmt, m_url in v.items():
                        ext = determine_ext(m_url)
                        if ext == 'm3u8':
                            formats.extend(
                                self._extract_m3u8_formats(
                                    m_url, video_id, 'mp4', 'm3u8_native',
                                    m3u8_id=fmt))
                        elif ext == 'mpd':
                            formats.extend(
                                self._extract_mpd_formats(
                                    m_url, video_id, mpd_id=fmt))
            elif k == 'player_config':
                config = v
                m_url = v.get('fileKey')
                if m_url:
                    ext = determine_ext(m_url)
                    m_url = update_url_query(m_url, 
                            try_get(v, lambda x: x['extraSettings'], dict))
                    formats.append({
                       'url': m_url,
                       'ext': ext,
                    })

             self._sort_formats(formats)
            if config.get('mediaType') == 'audio':
                for f in formats:
                    f['vcodec'] = 'none'
 
            return {
                'title': title,
                'id': video_id,
                'formats': formats,
                'thumbnail': (config.get('poster')
                              or self._og_search_thumbnail(webpage)),
                'description': self._html_search_meta(('description', 'twitter:description'), webpage),
            }

Aug 23 '21 11:08 dirkf

Using only DASH/MPD is not the best option regarding RTP Play because they have 4 or 5 different approaches in their code depending on the area - main RTP Play, RTP Arquivos, Zig Zag, Estudo em Casa, they are all different.

My previous extractor dealt with all except RTP Arquivos which is now fixed in my latest commit. I've also put up more tests so the code can be run against those different areas.

But I have to thank both @pferreir and @dirkf becauce I'm far from a pro in Python and I've used some of your code/comments to improve my own code.

Aug 28 '21 20:08 vallovic

Tested this again, and it does fix this extractor. Any chance of rebasing this to the latest "master"?

Mar 12 '23 23:03 somini

What's the status of this?

Aug 01 '23 21:08 somini

The PR author didn't implement any review comments. Looks like it needs a QA review before merging.

Aug 02 '23 14:08 dirkf

The blocking issue seems to have been merged.

Mar 11 '24 23:03 somini

youtube-dl youtube-dl copied to clipboard

Fix RTP Play support as of Aug 2021

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

youtube-dl
youtube-dl copied to clipboard