URLExtract
URLExtract copied to clipboard
URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
``` >>> from urlextract import URLExtract >>> extractor = URLExtract() >>> extractor.find_urls("You can also visit my website…IMINIT.MYAMBIT.COM") ['website…IMINIT.MYAMBIT.COM'] >>> extractor.find_urls("some%sIMINIT.MYAMBIT.COM" % chr(8231)) ['some‧IMINIT.MYAMBIT.COM'] ``` These are not valid URL characters...
``` (venv) yossi@ubuntu7:~/testing$ python --version Python 3.10.2 (venv) yossi@ubuntu7:~/testing$ pip list Package Version ------------ ------- filelock 3.6.0 idna 3.3 pip 22.0.3 platformdirs 2.5.1 setuptools 58.1.0 uritools 4.0.0 urlextract 1.5.0 (venv)...
I am getting wrong indices when the domain name of a URL contains uppercase characters. To reproduce: ``` from urlextract import URLExtract extractor = URLExtract() urls = extractor.find_urls("www.Google.com", get_indices=True) print(urls[0])...
I'm trying to use URLExtract in a serverless function, but locking the cached TLD file provokes an error on this read-only system. [cachefile.py](https://github.com/lipoja/URLExtract/blob/master/urlextract/cachefile.py) tries to lock the file https://github.com/lipoja/URLExtract/blob/638c0e2d4d8fec077b13b0eefb2c96ffaee112be/urlextract/cachefile.py#L236 but...
Hi @lipoja I see this ERROR on my project: ``` File "/home/zaki/git/blue/eggs/urlextract-1.5.0-py3.6.egg/urlextract/cachefile.py", line 19, in import filelock File "/home/zaki/git/blue/eggs/filelock-3.4.2-py3.6.egg/filelock/__init__.py", line 8 from __future__ import annotations ^ SyntaxError: future feature annotations...
I tried on my repo to run `mypy file.py` https://github.com/python/mypy ``` error: Skipping analyzing "urlextract": module is installed, but missing library stubs or py.typed marker note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports ```
implemented ideas discussed in #91. Also moved all dns checking for `find_urls` and `has_urls` so all found urls could be check concurrently if the user needs. I kept all intances...
`[email protected]:snowplow/snowplow-python-tracker.git` is not found This can be found at https://pypi.org/project/minimal-snowplow-tracker/ ```py >>> import urlextract >>> e = urlextract.urlextract_core.URLExtract() >>> e.find_urls('[email protected]:snowplow/snowplow-python-tracker.git') [] ``` A good list of sample VCS links can...
Is there any room to check dns concurrently as it might be a time consuming task? I've found some hacky ways to do that, but maybe it could be a...