URLExtract left walk does not stop on various unicode chars

left walk does not stop on various unicode chars

Open amoldavsky opened this issue 2 years ago • 0 comments

>>> from urlextract import URLExtract
>>> extractor = URLExtract()
>>> extractor.find_urls("You can also visit my website…IMINIT.MYAMBIT.COM")
['website…IMINIT.MYAMBIT.COM']
>>> extractor.find_urls("some%sIMINIT.MYAMBIT.COM" % chr(8231))
['some‧IMINIT.MYAMBIT.COM']

These are not valid URL characters (going to the left)

Mar 14 '22 18:03 amoldavsky

URLExtract URLExtract copied to clipboard

left walk does not stop on various unicode chars

URLExtract
URLExtract copied to clipboard