youtube-dl
youtube-dl copied to clipboard
[generic] misc. fixes (lazyYT test, first_bytes whitespace)
I happened to have b68a812ea839e44148516a34a15193189e58ba77 open recently, and wound up at the English version, i.e. http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf# via http://www.hp.com, and noted it doesn't work with Youtube-DL.
Unlike the Chinese version, nothing to do with Brightcove. #12501 is one problem, filed separately (not fixed here).
Here are two fixes for other issues I found, first 00bc75c, that this webpage starts with 512 bytes of whitespace so it fails the is_html(first_bytes)
check:
(Pdb) p first_bytes
'\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n\r\n\r\n'
(Pdb)
and also 6206194, that it uses a pattern similar to LazyYT, whose test is broken because the domain is gone:
pb3:extractor jhawk$ host -t a discourse.ubuntu.com
discourse.ubuntu.com is an alias for ubuntu.bydiscourse.com.
pb3:extractor jhawk$ host -t a ubuntu.bydiscourse.com.
ubuntu.bydiscourse.com has no A record
Of course, after that, Youtube-DL still doesn't work. I'm not sure how much we care, since it's a YouTube embed, so presumably users will know they can click on the YouTube button and find the YouTube URL. But here's the mechanism:
<div class="youtube-popup no-default-transition"
data-playlist="PLoMwRIIUViGQ48XKzXNxTjvICAiZTLv8I"
data-full-video-id="U3QXMMV-Srs">
<div id="play-list"></div>
<iframe id="full-video" frameborder="0" allowfullscreen="1" title="YouTube video player" width="640" height="360"></iframe>
<div class="close-popup"></div>
</div>
which is quite similar to the LazyYT pattern:
<div class="lazyYT"
data-youtube-id="y9r6qp8UBQc"
data-ratio="16:9">
What is Skiplagged?
</div>
but it doesn't seem to be used on other sites that I could find, so perhaps it doesn't meet the standards for inclusion. Not sure.
Whoops, 00bc75c broke Python 3 because utf-8 decode issues. Fixed in a5d5a2c.
Err, so… 6206194c5ace95e5a825b4a58a395030804e7c20 was committed as e8e4cc5a6a3ad8bf94d9ff9e5bb2d72712e14c34 recently, but that leaves the first_bytes
whitespace fix. Not sure they (00bc75ca0115fa57ffc700357ba6ef86f3355bb9, a5d5a2c068b00a8118fa9e3c32a9d93f316b2edd) were deliberately omitted or inadvertently?
Thanks.
@dstftw, did you mean to commit half of this only?
Yes.