URLExtract icon indicating copy to clipboard operation
URLExtract copied to clipboard

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

Results 45 URLExtract issues
Sort by recently updated
recently updated
newest added

``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("You can also visit my website…IMINIT.MYAMBIT.COM") ['website…IMINIT.MYAMBIT.COM'] >>> extractor.find_urls("some%sIMINIT.MYAMBIT.COM" % chr(8231)) ['some‧IMINIT.MYAMBIT.COM'] ``` These are not valid URL characters...

``` (venv) yossi@ubuntu7:~/testing$ python --version Python 3.10.2 (venv) yossi@ubuntu7:~/testing$ pip list Package Version ------------ ------- filelock 3.6.0 idna 3.3 pip 22.0.3 platformdirs 2.5.1 setuptools 58.1.0 uritools 4.0.0 urlextract 1.5.0 (venv)...

I am getting wrong indices when the domain name of a URL contains uppercase characters. To reproduce: ``` from urlextract import URLExtract extractor = URLExtract() urls = extractor.find_urls("www.Google.com", get_indices=True) print(urls[0])...

I'm trying to use URLExtract in a serverless function, but locking the cached TLD file provokes an error on this read-only system. [cachefile.py](https://github.com/lipoja/URLExtract/blob/master/urlextract/cachefile.py) tries to lock the file https://github.com/lipoja/URLExtract/blob/638c0e2d4d8fec077b13b0eefb2c96ffaee112be/urlextract/cachefile.py#L236 but...

Hi @lipoja I see this ERROR on my project: ``` File "/home/zaki/git/blue/eggs/urlextract-1.5.0-py3.6.egg/urlextract/cachefile.py", line 19, in import filelock File "/home/zaki/git/blue/eggs/filelock-3.4.2-py3.6.egg/filelock/__init__.py", line 8 from __future__ import annotations ^ SyntaxError: future feature annotations...

I tried on my repo to run `mypy file.py` https://github.com/python/mypy ``` error: Skipping analyzing "urlextract": module is installed, but missing library stubs or py.typed marker note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports ```

implemented ideas discussed in #91. Also moved all dns checking for `find_urls` and `has_urls` so all found urls could be check concurrently if the user needs. I kept all intances...

`[email protected]:snowplow/snowplow-python-tracker.git` is not found This can be found at https://pypi.org/project/minimal-snowplow-tracker/ ```py >>> import urlextract >>> e = urlextract.urlextract_core.URLExtract() >>> e.find_urls('[email protected]:snowplow/snowplow-python-tracker.git') [] ``` A good list of sample VCS links can...

enhancement
medium

Is there any room to check dns concurrently as it might be a time consuming task? I've found some hacky ways to do that, but maybe it could be a...