URLextractor
URLextractor copied to clipboard
[FALSE ALARM, nevermind] Bug report: single quote in the URL is treated as an end-of-URL.
Page tested: https://www.uchinokomato.me/chara/show/44405
When it tries to obtain this URL:
https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344
(Note the '
symbol INSIDE the URL and is part of the string.)
It extracts this instead:
https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889
What happened is that the script got confused thinking that the single quote (or apostrophe) inside the URL is the end of the string, but wasn't. Here is the HTML code that the extractor script is seeing:
<a data-lightbox="gallery" data-title="Uploaded at 2016-4-4 7:01
89'MO様に描いていただきました" href="https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/original/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344"><img src="https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/105/454/medium/%E3%83%AA%E3%83%B3%E3%82%AB%E3%81%A1%E3%82%83%E3%82%93%2889'MO%E3%81%95%E3%82%93%E3%81%8B%E3%82%89%E3%81%AE%E9%A0%82%E3%81%8D%E7%89%A9%29.png?1459753344" alt="%e3%83%aa%e3%83%b3%e3%82%ab%e3%81%a1%e3%82%83%e3%82%93%2889'mo%e3%81%95%e3%82%93%e3%81%8b%e3%82%89%e3%81%ae%e9%a0%82%e3%81%8d%e7%89%a9%29" style="height: 234.283px;"></a>
Note the URL is wrapped in double quotes.
Using a quote and double quote together can be used to have strings inside a quote (such as in javascript:
onchange="Function('Arg1', 'Arg2'); Calculate()"
)
Also "
cannot be used in a filename (reserved character).
WAIT! FALSE ALARM!
I was using my NP++ macros which inadvertenly removes these single quotes when it shouldn't