trafaret icon indicating copy to clipboard operation
trafaret copied to clipboard

Valid url do not fit trafaret.URL

Open FedirAlifirenko opened this issue 6 years ago • 4 comments

Hi @Deepwalker . I found the next validation error for working url 'https://www.dior.com/fr_fr/maquillage/adoptez-le-look-du-defile-croisiere\xa02020' Is it expected behavior? What do you think?

(3_7_2) MacBook-Pro-2:test fedir$ python -c "import trafaret as t; t.URL.check('https://www.dior.com/fr_fr/maquillage/adoptez-le-look-du-defile-croisiere\xa02020')"
Traceback (most recent call last):
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 166, in transform
    return self.trafaret(value, context=context)
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 156, in __call__
    return self.check(val, context=context)
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 118, in check
    return self.transform(value, context=context)
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 286, in transform
    raise DataError(dict(enumerate(errors)), trafaret=self)
trafaret.dataerror.DataError: {0: DataError(does not match pattern ^(?:http|ftp)s?://(?:\S+(?::\S*)?@)?(?:(?:[A-Z0-9](?:[A-Z0-9-_]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|localhost|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?::\d+)?(?:/?|[/?]\S+)$), 1: DataError(does not match pattern ^(?:http|ftp)s?://(?:\S+(?::\S*)?@)?(?:(?:[A-Z0-9](?:[A-Z0-9-_]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|localhost|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?::\d+)?(?:/?|[/?]\S+)$)}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 118, in check
    return self.transform(value, context=context)
  File "/Users/fedir/env/3_7_2/lib/python3.7/site-packages/trafaret/base.py", line 168, in transform
    raise DataError(self.message, value=value)
trafaret.dataerror.DataError: value is not URL

FedirAlifirenko avatar May 28 '19 16:05 FedirAlifirenko

Hi! Can you check if this link works with this pr? https://github.com/Deepwalker/trafaret/pull/36

Deepwalker avatar May 28 '19 19:05 Deepwalker

Actually I'm not sure about right behavior. It's can be that link is actually incorrect. Will need to reread rfc

Deepwalker avatar May 28 '19 20:05 Deepwalker

Proof of URL correctness is hard. I suspect that regex-based solution is supposed to provide false positives by design :( Even much more complicated yarl is not free from such things. Well, yarl.URL() works pretty good but yarl.URL.build() cannot parse valid args now :(

asvetlov avatar May 28 '19 21:05 asvetlov

@asvetlov

but yarl.URL.build() cannot parse valid args now

What are you mean ? It seems, everything works:

fedor@ubuntu:~$ python -c "import yarl; print(yarl.URL.build(host='example.com', scheme='https', path='/path\xa0abc'))"
https://example.com/path%C2%A0abc
fedor@ubuntu:~$ python -V
Python 3.7.3
fedor@ubuntu:~$ pip list | grep yarl
yarl       1.3.0

FedirAlifirenko avatar May 29 '19 16:05 FedirAlifirenko