gh-135768: fix allowed/blocked IPv6 domains in `http.cookiejar`
PR for #135768, Thanks!
- Issue: gh-135768
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
Please add tests.
Done.
I have made the requested changes; please review again
Thanks for making the requested changes!
@picnixz: please review the changes made to this pull request.
In a follow-up PR I think we can refactor the tests because there's lot of duplicated code but not in this one yet.
I'll work on this since this PR is done.
I'll work on this since this PR is done.
No, wait until this is merged. We may need a new issue but this is not pressing right now. Focus on this one first.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
Please, answer to this. Why should we avoid adding '.local'? until this is answered, I won't review it more.
because using .local in IPv6 doesn't make any sense. It will change the input IPv6 to something like [::1].local.
The additional .local is designed for HDNs not IP addresses. Before this PR. We assumed that there are only two types of input: IPv4 addr and HDN. IPv4 indeed contains dots. Some abnormal HDN do not contain dot, than we are adding a .local at the end of it. This is the expected case.
But, IPv6 is a edge case. It do not contains dot. And it is not a HDN. Adding .local after a IPv6 address is meaningless and, the most important thing, will cause error in logic when we are blocking hosts. Since the IPv6 is [::1].local and the input is [::1], every IPv6 is not blocked.
It is the .local at the end of the address that cause the bug.
My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in set_ok_domain and I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().
My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in
set_ok_domainand I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().
I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:
If a host name contains no dots, the effective host name is that name with the string .local appended to it.
And the implementation is correct. IPv4 contains dots so .local will not be added. The .local will only be added to host name without dots. like http://acme, the requested host name will be changed to acme.local, but http://127.0.0.1 will not
so sorry in advanced if I misunderstand the code
you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:
import urllib.request
from http.cookiejar import CookieJar, DefaultCookiePolicy
class FakeResponse:
def __init__(self, headers=[], url=None):
"""
headers: list of RFC822-style 'Key: value' strings
"""
import email
self._headers = email.message_from_string("\n".join(headers))
self._url = url
def info(self): return self._headers
pol = DefaultCookiePolicy(
rfc2965=True, blocked_domains=[])
c = CookieJar(policy=pol)
headers = ["Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/"]
pol.set_blocked_domains(['acme'])
req = urllib.request.Request("http://acme")
res = FakeResponse(headers, "http://acme")
c.extract_cookies(res, req)
# And
c.clear()
pol.set_blocked_domains(['127.0.0.1'])
req = urllib.request.Request("http://127.0.0.1")
res = FakeResponse(headers, "http://127.0.0.1")
c.extract_cookies(res, req)
And for the issue of [::1] and ::1. I quote this from RFC 2396 3.2.2
The host is a domain name of a network host, or its IPv4 address as a set of four decimal digit groups separated by ".". Literal IPv6 addresses are not supported.
I chose to use only [::1] because of this statement, hum..... I'll find some clearer statements now
I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:
Oups my bad. Ok, so this is an RFC thing. Now, is there a similar RFC for IPv6?
What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).
(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).
so sorry in advanced if I misunderstand the code
no worries!
you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:
Sorry but I don't have time for that.
(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).
I think we could treat them in seperate issues and PRs, I think we could just focus on fixing allowed/blocked IPv6 domains in http.cookiejar now, and in the future we need to change a lot on that(I think I can handle this), including tests, docs, and logic everywhere. But yeah, I think we could just make sure the blocking and allowing work for IPv6 in this PR. I will make some changes based on what you've commented now.
Sorry but I don't have time for that.
I really thank you so much for your time. My coding skills and experiences have improved by all this.
No worries of set_ok_domain. This PR do not change any logic of it. We can change the logic in the future PRs. I think we could just focus on allowed/blocked issues in this PR now.
The only related change of other funcs is eff_request_host, which is used in set_ok_domain. But the only thing it did is just to treat IPv6 as a IP address and not adding .local on it. Since the original one don't support IPv6, this PR indeed also don't supprt IPv6 in the logic of set_ok_domain. We could take care of it later.
(There are indeed a hole bunch of things to change if we are going to support IPv6 entirely)
What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).
I'll take a look at this after this PR is merged.
I'm now convinced to only use [::1] strictly. RFC states that Literal IPv6 addresses are not supported as host name. So to be strict, only [::1] is a legel hostname while ::1 is a legel IPv6 address.
And ppl are more likely to use [::1] instead of ::1 when blocking domains. I think we just support [::1] and add warnings in the Docs or something. IMO we don't need to change any thing here, except of changing the is_ip_like func into is_ip_like_hostname func.
::1 is not a legel hostname, it's barely a IPv6 address. Instead, [::1] is a legel hostname, but not a legel IPv6 address. And what we're blocking are hostnames not IPs.
Note that IPv4 could be both a address and hostname. But IPv6 can't.
Thanks for making the requested changes!
@picnixz: please review the changes made to this pull request.
changes were made
basically all done and ready now. Thank you for your time and effort!
The helper functions will help a lot in the future PRs I think.
last nits and we're done
Thanks! This is my first time ever to create a PR for Cpython with nearly a hundred lines of code. Thank you for your patience and guidence!
I'm mostly good but I'll ask another expert just in case @vadmium.
May I close the issue? I think its mostly solved.
No, we should never close an issue until the corresponding PR has been merged or we abandoned the proposal. In this case, closing the issue means that both the PR and its backports have been merged.
No, we should never close an issue until the corresponding PR has been merged or we abandoned the proposal. In this case, closing the issue means that both the PR and its backports have been merged.
Got it!
I will also add warnings in docs indicating ::1 in not valid when they are blocking or allowing hostname after this is merged, also, remove the duplicated tests in another PR. IPv6 appears to be an edge case in many functions of http.cookiejar, I'll try to fix them one by one in the future.
Hi @picnixz, friendly pinging here. Shell we get this PR merged? caz I can't contribute to further following PRs and issues if this is not merged. TiA ;) (I think it is mostly solved, and tiny fixes can be coped later, if any.)
No. I want to hear from Martin first. PRs don't need to be rushed and it's common to have PRs landing after a few weeks.
You can still contribute to other parts of the library (which I would advise you to do so).