URLExtract
URLExtract copied to clipboard
left walk does not stop on various unicode chars
>>> from urlextract import URLExtract
>>> extractor = URLExtract()
>>> extractor.find_urls("You can also visit my website…IMINIT.MYAMBIT.COM")
['website…IMINIT.MYAMBIT.COM']
>>> extractor.find_urls("some%sIMINIT.MYAMBIT.COM" % chr(8231))
['some‧IMINIT.MYAMBIT.COM']
These are not valid URL characters (going to the left)