youtube-dl
youtube-dl copied to clipboard
cannot correctly resolve `bilibili.com` video URLs contained in a festival / bilibili.com 的包含在 festival 中的视频链接不能被正确解析
Checklist
- [x] I'm reporting a broken site support
- [x] I've verified that I'm running youtube-dl version 2021.12.17
- [x] I've checked that all provided URLs are alive and playable in a browser
- [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [x] I've searched the bugtracker for similar issues including closed ones
Verbose log
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.bilibili.com/video/BV1dZ4y1Y7bt', '-v']
[debug] Encodings: locale cp936, fs mbcs, out cp936, pref cp936
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.19041
[debug] exe versions: none
[debug] Proxy map: {}
[BiliBili] 1dZ4y1Y7bt: Downloading webpage
[BiliBili] 1dZ4y1Y7bt: Downloading video info page
ERROR: Unable to extract title; please report this issue on https://yt-dl.org/bug . Make sure you are using
the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and
include its complete output.
Traceback (most recent call last):
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\Youtube
DL.py", line 815, in wrapper
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\Youtube
DL.py", line 836, in __extract_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 534, in extract
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\bilibili.py", line 213, in _real_extract
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 1021, in _html_search_regex
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 1012, in _search_regex
youtube_dl.utils.RegexNotFoundError: Unable to extract title; please report this issue on https://yt-dl.org
/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-
dl with the --verbose flag and include its complete output.
Description
cannot correctly resolve bilibili.com
video URLs which is contained in a festival. for example,
https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt
while a normal video(not contained in a festival) URL should look like
https://www.bilibili.com/video/BVxxxxxxxx
but using https://www.bilibili.com/video/BV1dZ4y1Y7bt
still does not work for it auto redirects back to the festival URL.
bilibili.com 的包含在 festival 中的视频链接不能被正确解析。
-
The
_VALID_URL
can be updated to match URLs like https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt. Is this the only such format (ie.../festival/slug?bvid=...
) or should other top-level path components and/or more path components be matched? -
The error occurs because the title extraction fails. In the problem page there is this
<title>洛天依十周年官方演唱会</title>
. If that should be the fallback title, that's fine, but I'm not familiar with the content. Then
$ python3.9 -m youtube_dl -v -F 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: a5464aca1
[debug] Python version 3.9.16 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[BiliBili] 1dZ4y1Y7bt: Downloading webpage
[BiliBili] 1dZ4y1Y7bt: Downloading video info page
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
WARNING: unable to extract og:image; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[info] Available formats for 1dZ4y1Y7bt:
format code extension resolution note
0 flv unknown 3.53GiB
$
- URL format like
.../festival/<slug>?bvid=<bvid>)
is used on rare occasions. - What's in the
tag should not be the fallback title, that is the title of the "festival". The requested video is one of many videos published in this "festival"
What should be the title of the test video https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt?
If there isn't an obvious candidate, the title could be f'{festival_title}: {video_id}'
or similar.
The element can be located with .video-toobar_title
whoes innerText is 【洛天依原创曲】光与影的对白【2022官方生贺曲】
. This is very different from other video pages.
That's fine. There are other fields not being extracted but I don't think they should cause warnings. Obviously, suggestions for alternative sources in the page are welcome.
$ python3.9 -m youtube_dl --get-title 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt'
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
WARNING: unable to extract og:image; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
【洛天依原创曲】光与影的对白【2022官方生贺曲】
$
Are the 【】
part of the title or should they be stripped?
no it shouldn't, the 【】
is a part of the title.
P.S. video description can be read by document.querySelector('.video-desc').innerHTML