youtube-dl icon indicating copy to clipboard operation
youtube-dl copied to clipboard

[generic] misc. fixes (lazyYT test, first_bytes whitespace)

Open johnhawkinson opened this issue 7 years ago • 4 comments

I happened to have b68a812ea839e44148516a34a15193189e58ba77 open recently, and wound up at the English version, i.e. http://www8.hp.com/us/en/solutions/security/thewolf.html?jumpid=va_87trme41uf# via http://www.hp.com, and noted it doesn't work with Youtube-DL.

Unlike the Chinese version, nothing to do with Brightcove. #12501 is one problem, filed separately (not fixed here).

Here are two fixes for other issues I found, first 00bc75c, that this webpage starts with 512 bytes of whitespace so it fails the is_html(first_bytes) check:

(Pdb) p first_bytes
'\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n                                            \r\n                                            \r\n                                            \r\n                                            \r\n                                                          \r\n                                            \r\n                                            \r\n\r\n\r\n'
(Pdb) 

and also 6206194, that it uses a pattern similar to LazyYT, whose test is broken because the domain is gone:

pb3:extractor jhawk$ host -t a  discourse.ubuntu.com
discourse.ubuntu.com is an alias for ubuntu.bydiscourse.com.
pb3:extractor jhawk$ host -t a  ubuntu.bydiscourse.com.
ubuntu.bydiscourse.com has no A record

Of course, after that, Youtube-DL still doesn't work. I'm not sure how much we care, since it's a YouTube embed, so presumably users will know they can click on the YouTube button and find the YouTube URL. But here's the mechanism:

    <div  class="youtube-popup no-default-transition" 
          data-playlist="PLoMwRIIUViGQ48XKzXNxTjvICAiZTLv8I" 
          data-full-video-id="U3QXMMV-Srs">
      <div id="play-list"></div>
      <iframe id="full-video" frameborder="0" allowfullscreen="1" title="YouTube video player" width="640" height="360"></iframe>
      <div class="close-popup"></div>
    </div>

which is quite similar to the LazyYT pattern:

<div class="lazyYT"
  data-youtube-id="y9r6qp8UBQc"
  data-ratio="16:9">
    What is Skiplagged?
</div>

but it doesn't seem to be used on other sites that I could find, so perhaps it doesn't meet the standards for inclusion. Not sure.

johnhawkinson avatar Mar 20 '17 01:03 johnhawkinson

Whoops, 00bc75c broke Python 3 because utf-8 decode issues. Fixed in a5d5a2c.

johnhawkinson avatar Mar 20 '17 01:03 johnhawkinson

Err, so… 6206194c5ace95e5a825b4a58a395030804e7c20 was committed as e8e4cc5a6a3ad8bf94d9ff9e5bb2d72712e14c34 recently, but that leaves the first_bytes whitespace fix. Not sure they (00bc75ca0115fa57ffc700357ba6ef86f3355bb9, a5d5a2c068b00a8118fa9e3c32a9d93f316b2edd) were deliberately omitted or inadvertently?

Thanks.

johnhawkinson avatar Mar 28 '17 20:03 johnhawkinson

@dstftw, did you mean to commit half of this only?

johnhawkinson avatar Apr 08 '17 04:04 johnhawkinson

Yes.

dstftw avatar Apr 08 '17 05:04 dstftw