URLExtract
URLExtract copied to clipboard
URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
closes https://github.com/lipoja/URLExtract/issues/113 and should address https://github.com/georgettica/url-checker/blob/master/.github/workflows/verify.yml#L67-L69
Hi, I am trying to extract the URL from the following text, But it is not grabbing the link due to the hyphen. EncodedText="* test link -https://www.google.com"; extractor = URLExtract()...
code under: ``` from urlextract import URLExtract extractor = URLExtract() extractor.update_when_older(7) urls = extractor.find_urls("Text with URLs. Let's have URL janlipovsky.cz as an example.") ``` Library version: urlextract==1.6.0
Hi, while trying to use the `URLEextract()` in function to parse a dataframe column, it runs really slow. Here is my code: ```python3 def extract_urls(last): extractor = URLExtract() count =...
``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("@gmail.com") ['@gmail.com'] >>> extractor.find_urls("bad.email @gmail.com") ['bad.email', '@gmail.com'] ``` should not be grabbing `@gmail.com` at all
Fixes #82 the extraction is incorrect for URLs with multiple protocols: ``` Link:https://www.google.com job:https://2.ua/YHfw38 ```
This should not be the case: ``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("https://www.formpl.us/form/1653896001, work independently from home") ['https://www.formpl.us/form/1653896001,'] ```
IPv6?
URLExtract seems to extract literal IPv4 addresses just fine, but not literal IPv6 addresses?
I tried to use the module to detect a link in the following string. ``` Link:https://www.google.com ``` but it failed to detect that there is a url.