URLExtract issues

feat(typing): add typing

7

closes https://github.com/lipoja/URLExtract/issues/113 and should address https://github.com/georgettica/url-checker/blob/master/.github/workflows/verify.yml#L67-L69

georgettica

Does Not extract the URL that is leading special character

Hi, I am trying to extract the URL from the following text, But it is not grabbing the link due to the hyphen. EncodedText="* test link -https://www.google.com"; extractor = URLExtract()...

praneethpj

ERROR: Can not download list of TLDs. (URLError: [Errno 104] Connection reset by peer)

1

code under: ``` from urlextract import URLExtract extractor = URLExtract() extractor.update_when_older(7) urls = extractor.find_urls("Text with URLs. Let's have URL janlipovsky.cz as an example.") ``` Library version: urlextract==1.6.0

koliaok

URLExtract() init really slow

Hi, while trying to use the `URLEextract()` in function to parse a dataframe column, it runs really slow. Here is my code: ```python3 def extract_urls(last): extractor = URLExtract() count =...

gilbd

should not grab email fragments

1

``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("@gmail.com") ['@gmail.com'] >>> extractor.find_urls("bad.email @gmail.com") ['bad.email', '@gmail.com'] ``` should not be grabbing `@gmail.com` at all

amoldavsky

fix-multiple-protocols-in-url

2

Fixes #82 the extraction is incorrect for URLs with multiple protocols: ``` Link:https://www.google.com job:https://2.ua/YHfw38 ```

amoldavsky

comma extracted at the end if url ends with comma

2

This should not be the case: ``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("https://www.formpl.us/form/1653896001, work independently from home") ['https://www.formpl.us/form/1653896001,'] ```

amoldavsky

IPv6?

1

URLExtract seems to extract literal IPv4 addresses just fine, but not literal IPv6 addresses?

larseggert

URL Detection Problem

5

I tried to use the module to detect a link in the following string. ``` Link:https://www.google.com ``` but it failed to detect that there is a url.

Ricardolcm888

URLExtract
URLExtract copied to clipboard

Metadata

Add type hints

feat(typing): add typing

Does Not extract the URL that is leading special character

ERROR: Can not download list of TLDs. (URLError: [Errno 104] Connection reset by peer)

URLExtract() init really slow

should not grab email fragments

fix-multiple-protocols-in-url

comma extracted at the end if url ends with comma

IPv6?

URL Detection Problem

← Metadata

Owner

Metadata

URLExtract URLExtract copied to clipboard

Metadata

← Metadata

Owner

Metadata

URLExtract
URLExtract copied to clipboard