URLExtract issues

left walk does not stop on various unicode chars

``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("You can also visit my website…IMINIT.MYAMBIT.COM") ['website…IMINIT.MYAMBIT.COM'] >>> extractor.find_urls("some%sIMINIT.MYAMBIT.COM" % chr(8231)) ['some‧IMINIT.MYAMBIT.COM'] ``` These are not valid URL characters...

amoldavsky

Passing custom cache_dir doesnt seem to actually save the tlds...txt file in that dir

``` (venv) yossi@ubuntu7:~/testing$ python --version Python 3.10.2 (venv) yossi@ubuntu7:~/testing$ pip list Package Version ------------ ------- filelock 3.6.0 idna 3.3 pip 22.0.3 platformdirs 2.5.1 setuptools 58.1.0 uritools 4.0.0 urlextract 1.5.0 (venv)...

Yossi

Wrong indices with uppercase characters in domain name

1

I am getting wrong indices when the domain name of a URL contains uppercase characters. To reproduce: ``` from urlextract import URLExtract extractor = URLExtract() urls = extractor.find_urls("www.Google.com", get_indices=True) print(urls[0])...

tkrissuu

TLD cache filelock error on read-only systems

12

I'm trying to use URLExtract in a serverless function, but locking the cached TLD file provokes an error on this read-only system. [cachefile.py](https://github.com/lipoja/URLExtract/blob/master/urlextract/cachefile.py) tries to lock the file https://github.com/lipoja/URLExtract/blob/638c0e2d4d8fec077b13b0eefb2c96ffaee112be/urlextract/cachefile.py#L236 but...

LaundroMat

URLExtract no longer support Python 3.6 because of filelock recent changes

1

Hi @lipoja I see this ERROR on my project: ``` File "/home/zaki/git/blue/eggs/urlextract-1.5.0-py3.6.egg/urlextract/cachefile.py", line 19, in import filelock File "/home/zaki/git/blue/eggs/filelock-3.4.2-py3.6.egg/filelock/__init__.py", line 8 from __future__ import annotations ^ SyntaxError: future feature annotations...

za

add types to urlextract

3

I tried on my repo to run `mypy file.py` https://github.com/python/mypy ``` error: Skipping analyzing "urlextract": module is installed, but missing library stubs or py.typed marker note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports ```

georgettica

move dns checking to dedicated class and add concurrency

9

implemented ideas discussed in #91. Also moved all dns checking for `find_urls` and `has_urls` so all found urls could be check concurrently if the user needs. I kept all intances...

nicolasassi

ossar test

lipoja

VCS/Git remote URLs not found

2

`[email protected]:snowplow/snowplow-python-tracker.git` is not found This can be found at https://pypi.org/project/minimal-snowplow-tracker/ ```py >>> import urlextract >>> e = urlextract.urlextract_core.URLExtract() >>> e.find_urls('[email protected]:snowplow/snowplow-python-tracker.git') [] ``` A good list of sample VCS links can...

jayvdb

enhancement

medium

check dns concurrently to speed up lookup

8

Is there any room to check dns concurrently as it might be a time consuming task? I've found some hacky ways to do that, but maybe it could be a...

nicolasassi

URLExtract
URLExtract copied to clipboard

Metadata

left walk does not stop on various unicode chars

Passing custom cache_dir doesnt seem to actually save the tlds...txt file in that dir

Wrong indices with uppercase characters in domain name

TLD cache filelock error on read-only systems

URLExtract no longer support Python 3.6 because of filelock recent changes

add types to urlextract

move dns checking to dedicated class and add concurrency

ossar test

VCS/Git remote URLs not found

check dns concurrently to speed up lookup

← Metadata

Owner

Metadata

URLExtract URLExtract copied to clipboard

Metadata

← Metadata

Owner

Metadata

URLExtract
URLExtract copied to clipboard