youtube-dl icon indicating copy to clipboard operation
youtube-dl copied to clipboard

JSON can not be parsed for Reddit videos

Open rasoolZero opened this issue 2 years ago • 4 comments

Checklist

  • [x] I'm reporting a broken site support
  • [x] I've verified that I'm running youtube-dl version 2021.12.17
  • [x] I've checked that all provided URLs are alive and playable in a browser
  • [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [x] I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.3 (CPython) - Windows-10-10.0.19042-SP0
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1
[debug] Proxy map: {}
ERROR: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

I'm trying to get the duration of a video in reddit.com using youtube-dl but the json can not be parsed properly it works with command line executable, but my main goal is to get the duration in a python script and it completely crashes

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
    data = self._download_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
    res = self._download_json_handle(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
    return self._parse_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
    raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
    data = self._download_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
    res = self._download_json_handle(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
    return self._parse_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
    raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 824, in wrapper
    self.report_error(compat_str(e), e.format_traceback())
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 628, in report_error
    self.trouble(error_message, tb)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 598, in trouble
    raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

rasoolZero avatar Dec 14 '22 04:12 rasoolZero

We'll need to know the video URL to debug the problem. Update the description with the full verbose log.

dirkf avatar Dec 14 '22 15:12 dirkf

We'll need to know the video URL to debug the problem. Update the description with the full verbose log.

I actually included the whole verbose log, it's from the embedded python library here's the link I used to test

and the code I used


ydl_opts = {"no_warning":True,
				"forceduration":True,"quiet":True,"verbose":True}
				
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    dictMeta = ydl.extract_info("https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/",download=False)

rasoolZero avatar Dec 15 '22 19:12 rasoolZero

If custom code isn't working it's probably better to report the problem using the yt-dl main program, so in your case:

python -m youtube-dl -v [other options] 'https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/'

The lack of the normal program diagnostics is why I asked for the full log. At least it would have been helpful to mention how yt-dl was being invoked.

Anyhow, the main program reproduces the problem but https://old.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/ works (so, just like any normal use of Reddit). So does https://www.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/.

[Update] The problem is that the extractor fetches the JSON from the matched URL up to the end of the URL path component, without any trailing / or other URL path terminator, and with /.json appended. Then new.reddit.com/.../.json URL is returning a page of HTML, whereas fetching new.reddit.com/...//.json with wget does return the expected JSON. With old or www in the host field, the JSON is returned for both single and double /.

dirkf avatar Dec 15 '22 21:12 dirkf

As the extractor needs a tweak to support new.reddit.com properly, I'll keep this open. We may as well normalise the JSON look-up to www.reddit.com, unless anyone has evidence that (www|new|old).reddit.com present different media and/or metadata.

dirkf avatar Dec 16 '22 18:12 dirkf

--- old/youtube_dl/extractor/reddit.py
+++ new/youtube_dl/extractor/reddit.py
@@ -50,7 +50,7 @@
 
 
 class RedditRIE(InfoExtractor):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<burl>reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+)))'
     _TESTS = [{
         'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
         'info_dict': {
@@ -99,7 +99,8 @@
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
-        url, video_id = mobj.group('url', 'id')
+        url, video_id = mobj.group('burl', 'id')
+        url = 'https://www.' + url
 
         video_id = self._match_id(url)
 

dirkf avatar Jan 28 '23 16:01 dirkf