youtube-dl
youtube-dl copied to clipboard
JSON can not be parsed for Reddit videos
Checklist
- [x] I'm reporting a broken site support
- [x] I've verified that I'm running youtube-dl version 2021.12.17
- [x] I've checked that all provided URLs are alive and playable in a browser
- [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [x] I've searched the bugtracker for similar issues including closed ones
Verbose log
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.3 (CPython) - Windows-10-10.0.19042-SP0
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1
[debug] Proxy map: {}
ERROR: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Description
I'm trying to get the duration of a video in reddit.com using youtube-dl but the json can not be parsed properly it works with command line executable, but my main goal is to get the duration in a python script and it completely crashes
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
return json.loads(json_string)
File "C:\Python310\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
return json.loads(json_string)
File "C:\Python310\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
ie_result = self._real_extract(url)
File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
data = self._download_json(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
res = self._download_json_handle(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
return self._parse_json(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
return json.loads(json_string)
File "C:\Python310\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
ie_result = self._real_extract(url)
File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
data = self._download_json(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
res = self._download_json_handle(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
return self._parse_json(
File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 808, in extract_info
return self.__extract_info(url, ie, download, extra_info, process)
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 824, in wrapper
self.report_error(compat_str(e), e.format_traceback())
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 628, in report_error
self.trouble(error_message, tb)
File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 598, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
We'll need to know the video URL to debug the problem. Update the description with the full verbose log.
We'll need to know the video URL to debug the problem. Update the description with the full verbose log.
I actually included the whole verbose log, it's from the embedded python library here's the link I used to test
and the code I used
ydl_opts = {"no_warning":True,
"forceduration":True,"quiet":True,"verbose":True}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
dictMeta = ydl.extract_info("https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/",download=False)
If custom code isn't working it's probably better to report the problem using the yt-dl main program, so in your case:
python -m youtube-dl -v [other options] 'https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/'
The lack of the normal program diagnostics is why I asked for the full log. At least it would have been helpful to mention how yt-dl was being invoked.
Anyhow, the main program reproduces the problem but https://old.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/ works (so, just like any normal use of Reddit). So does https://www.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/.
[Update]
The problem is that the extractor fetches the JSON from the matched URL up to the end of the URL path component, without any trailing /
or other URL path terminator, and with /.json
appended. Then new.reddit.com/.../.json
URL is returning a page of HTML, whereas fetching new.reddit.com/...//.json
with wget does return the expected JSON. With old
or www
in the host field, the JSON is returned for both single and double /
.
As the extractor needs a tweak to support new.reddit.com properly, I'll keep this open. We may as well normalise the JSON look-up to www.reddit.com, unless anyone has evidence that (www|new|old).reddit.com present different media and/or metadata.
--- old/youtube_dl/extractor/reddit.py
+++ new/youtube_dl/extractor/reddit.py
@@ -50,7 +50,7 @@
class RedditRIE(InfoExtractor):
- _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
+ _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<burl>reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+)))'
_TESTS = [{
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
'info_dict': {
@@ -99,7 +99,8 @@
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- url, video_id = mobj.group('url', 'id')
+ url, video_id = mobj.group('burl', 'id')
+ url = 'https://www.' + url
video_id = self._match_id(url)