youtube-dl
youtube-dl copied to clipboard
Fix RTP Play support as of Aug 2021
Please follow the guide below
- You will be asked some questions, please read them carefully and answer honestly
- Put an
x
into all the boxes [ ] relevant to your pull request (like that [x]) - Use Preview tab to see how your pull request will actually look like
Before submitting a pull request make sure you have:
- [x] Searched the bugtracker for similar pull requests
- [x] Read adding new extractor tutorial
- [x] Read youtube-dl coding conventions and adjusted the code to meet them
- [x] Covered the code with tests (note that PRs without tests will be REJECTED)
- [x] Checked the code with flake8
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
- [x] I am the original author of this code and I am willing to release it under Unlicense
- [ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)
What is the purpose of your pull request?
- [x] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature
This builds on #28205 by @vallovic and once again fixes support for RTP Play downloads. Please not that one of the URL which were previously used in the tests was expired, so I replaced it with a more recent one. However, I get:
youtube_dl.utils.DownloadError: ERROR: requested format not available
From what I saw, changing the format
parameter in parameters.json
to best/bestvideo+bestaudio
solves it. That seems to be the default when the app is running too.
I had a different approach to decoding the messy player config which might be of use (see https://github.com/ytdl-org/youtube-dl/issues/29458#issuecomment-874354702). This follows on from your l.59; earlier on I had _URICHARS = r"[-.!~*'()%#\w]"
# Get JS objects:
# 'var f = {...}; var g = new RTPPlayer({...});'
js_match = re.search(
(r'(?s)var\s+f\s*=\s*%s\s*;\s*var\s+\w+\s*=\s*new\s+RTPPlayer\s*\(\s*%s\s*\)\s*;'
% (r'(?P<var_f>\{[^;]+\}|"[^;"]+")', r'(?P<player_config>\{[^;]+\})')),
webpage)
if not js_match:
raise RegexNotFoundError('Unable to extract player config')
js_objs = js_match.groupdict()
# '"cod", "ed ", "URL", "...", ...' -> "http://..."
def atob_decode(expr):
codes = re.findall(r'(?<=")%s+(?=")' % self._URICHARS, expr, flags=re.S)
return "'%s'" % compat_b64decode(compat_urllib_parse_unquote(''.join(codes)))
for k, v in js_objs.items():
# '... // comment\n' -> '... \n'
decoded = re.sub(r'\s//[^\n]*\n', '\n', v)
# 'file: f,' -> ''
decoded = re.sub(r'\s*file\s*:\s*\w+\s*,\s*', '', decoded)
# 'atob( decodeURIComponent(["aHR0cH", ...,"%3D%3D"].join("")))'
# -> decoded URI
decoded = re.sub(
(r'atob\s*\(\s*decodeURIComponent\s*\(\s*\[\s*((?:"%s+"\s*,\s*)*"%s+")\s*\]\s*\.\s*join\s*\(\s*""(?:\s*\)){3}'
% (self._URICHARS, self._URICHARS)),
lambda x: atob_decode(x.group(1)), decoded, flags=re.S)
js_objs[k] = self._parse_json(decoded, video_id, js_to_json)
formats = []
config = {}
for k, v in js_objs.items():
if k == 'var_f':
if isinstance(v, compat_str):
ext = determine_ext(v)
formats.append({
'url': v,
'ext': ext,
})
else:
for fmt, m_url in v.items():
ext = determine_ext(m_url)
if ext == 'm3u8':
formats.extend(
self._extract_m3u8_formats(
m_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=fmt))
elif ext == 'mpd':
formats.extend(
self._extract_mpd_formats(
m_url, video_id, mpd_id=fmt))
elif k == 'player_config':
config = v
m_url = v.get('fileKey')
if m_url:
ext = determine_ext(m_url)
m_url = update_url_query(m_url,
try_get(v, lambda x: x['extraSettings'], dict))
formats.append({
'url': m_url,
'ext': ext,
})
self._sort_formats(formats)
if config.get('mediaType') == 'audio':
for f in formats:
f['vcodec'] = 'none'
return {
'title': title,
'id': video_id,
'formats': formats,
'thumbnail': (config.get('poster')
or self._og_search_thumbnail(webpage)),
'description': self._html_search_meta(('description', 'twitter:description'), webpage),
}
Using only DASH/MPD is not the best option regarding RTP Play because they have 4 or 5 different approaches in their code depending on the area - main RTP Play, RTP Arquivos, Zig Zag, Estudo em Casa, they are all different.
My previous extractor dealt with all except RTP Arquivos which is now fixed in my latest commit. I've also put up more tests so the code can be run against those different areas.
But I have to thank both @pferreir and @dirkf becauce I'm far from a pro in Python and I've used some of your code/comments to improve my own code.
Tested this again, and it does fix this extractor. Any chance of rebasing this to the latest "master"?
What's the status of this?
The PR author didn't implement any review comments. Looks like it needs a QA review before merging.
The blocking issue seems to have been merged.