requests icon indicating copy to clipboard operation
requests copied to clipboard

HTTPDigestAuth fails on non-latin credentials

Open ondratu opened this issue 3 years ago • 3 comments

There was issue reported, which is closed with bad results.

https://github.com/psf/requests/blob/4f6c0187150af09d085c03096504934eb91c7a9e/requests/auth.py#L59-L63

Don't pass unicode strings in the arguments, but use UTF8 bytes instead.

self.session.get(main_url, auth=requests.auth.HTTPDigestAuth("Сергей_Ласточкин".encode('UTF-8'), '1234'))

Originally posted by @D-stefaang in https://github.com/psf/requests/issues/5089#issuecomment-763569911

But this is wrong! When i try to set user 'Ondřej' with this advice, requests send bad string:

HTTPDigestAuth('Ondřej'.encode('utf-8'), 'heslíčko')

creates header starts with wrong username!

Digest username="b'Ond\xc5\x99ej'"

ondratu avatar Apr 04 '22 06:04 ondratu

Hi @ondratu,

Could you please clarify what you believe is wrong in this case? ř is the byte-sequence \xc5\x99 in UTF-8, so we'd expect the bytes object to be Ond\xc5\x99ej. We can quickly verify this by checking:

'Ondřej'.encode('utf-8') == b'Ond\xc5\x99ej'
>>> True

It's not clear what other value you'd be expecting.

nateprewitt avatar Apr 04 '22 07:04 nateprewitt

Hmm, on closer inspection this does appear to be a bug. We're using the bytes username as an argument to format our string for the header. This causes the full literal "b'Ond\xc5\x99ej'" to be used which I agree doesn't look correct. This header should be encoded as bytes during creation but we currently defer that to be urllib3's problem.

When not encoding the auth/password, we get this:

Traceback (most recent call last):
  File "/Users/nateprewitt/Work/OpenSource/requests/test.py", line 3, in <module>
    r = requests.get('https://httpbin.org/digest-auth/auth/Ondřej/heslíčko', auth=h)
  File "/Users/nateprewitt/Work/OpenSource/requests/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  [...]
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1323, in _send_request
    self.putheader(hdr, value)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/site-packages/urllib3/connection.py", line 224, in putheader
    _HTTPConnection.putheader(self, header, *values)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1255, in putheader
    values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 20: ordinal not in range(256)

Fixing this is unfortunately somewhat complicated for a couple reasons:

1.) Users expect the output of HTTPDigestAuth to be a str in Python 3, changing that is likely breaking. 2.) If we were to encode the string, the libraries current convention would be latin-1 not utf-8. That wouldn't solve this issue.

I don't believe we can ever format this correctly in Python 3 with the current behavior though. I'll need to look more tomorrow, but we may consider a behavior change if self.username/self.password is passed in as bytes.

nateprewitt avatar Apr 04 '22 07:04 nateprewitt