cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-135768: fix allowed/blocked IPv6 domains in `http.cookiejar`

Open LamentXU123 opened this issue 6 months ago • 26 comments

PR for #135768, Thanks!

  • Issue: gh-135768

LamentXU123 avatar Jun 20 '25 17:06 LamentXU123

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bedevere-app[bot] avatar Jun 20 '25 17:06 bedevere-app[bot]

Please add tests.

Done.

LamentXU123 avatar Jun 20 '25 18:06 LamentXU123

I have made the requested changes; please review again

LamentXU123 avatar Jun 21 '25 06:06 LamentXU123

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

bedevere-app[bot] avatar Jun 21 '25 06:06 bedevere-app[bot]

In a follow-up PR I think we can refactor the tests because there's lot of duplicated code but not in this one yet.

I'll work on this since this PR is done.

LamentXU123 avatar Jun 21 '25 11:06 LamentXU123

I'll work on this since this PR is done.

No, wait until this is merged. We may need a new issue but this is not pressing right now. Focus on this one first.

picnixz avatar Jun 21 '25 12:06 picnixz

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bedevere-app[bot] avatar Jun 21 '25 12:06 bedevere-app[bot]

Please, answer to this. Why should we avoid adding '.local'? until this is answered, I won't review it more.

because using .local in IPv6 doesn't make any sense. It will change the input IPv6 to something like [::1].local.

The additional .local is designed for HDNs not IP addresses. Before this PR. We assumed that there are only two types of input: IPv4 addr and HDN. IPv4 indeed contains dots. Some abnormal HDN do not contain dot, than we are adding a .local at the end of it. This is the expected case.

But, IPv6 is a edge case. It do not contains dot. And it is not a HDN. Adding .local after a IPv6 address is meaningless and, the most important thing, will cause error in logic when we are blocking hosts. Since the IPv6 is [::1].local and the input is [::1], every IPv6 is not blocked.

It is the .local at the end of the address that cause the bug.

LamentXU123 avatar Jun 21 '25 12:06 LamentXU123

My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in set_ok_domain and I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().

picnixz avatar Jun 21 '25 12:06 picnixz

My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in set_ok_domain and I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().

I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:

If a host name contains no dots, the effective host name is that name with the string .local appended to it.

And the implementation is correct. IPv4 contains dots so .local will not be added. The .local will only be added to host name without dots. like http://acme, the requested host name will be changed to acme.local, but http://127.0.0.1 will not

so sorry in advanced if I misunderstand the code

LamentXU123 avatar Jun 21 '25 13:06 LamentXU123

you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:

import urllib.request
from http.cookiejar import CookieJar, DefaultCookiePolicy
class FakeResponse:
    def __init__(self, headers=[], url=None):
        """
        headers: list of RFC822-style 'Key: value' strings
        """
        import email
        self._headers = email.message_from_string("\n".join(headers))
        self._url = url
    def info(self): return self._headers
pol = DefaultCookiePolicy(
            rfc2965=True, blocked_domains=[])
c = CookieJar(policy=pol)
headers = ["Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/"]
pol.set_blocked_domains(['acme'])
req = urllib.request.Request("http://acme")
res = FakeResponse(headers, "http://acme")
c.extract_cookies(res, req)

# And

c.clear()
pol.set_blocked_domains(['127.0.0.1'])
req = urllib.request.Request("http://127.0.0.1")
res = FakeResponse(headers, "http://127.0.0.1")
c.extract_cookies(res, req)

LamentXU123 avatar Jun 21 '25 13:06 LamentXU123

And for the issue of [::1] and ::1. I quote this from RFC 2396 3.2.2

The host is a domain name of a network host, or its IPv4 address as a set of four decimal digit groups separated by ".". Literal IPv6 addresses are not supported.

I chose to use only [::1] because of this statement, hum..... I'll find some clearer statements now

LamentXU123 avatar Jun 21 '25 13:06 LamentXU123

I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:

Oups my bad. Ok, so this is an RFC thing. Now, is there a similar RFC for IPv6? What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).

(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).

so sorry in advanced if I misunderstand the code

no worries!

you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:

Sorry but I don't have time for that.

picnixz avatar Jun 21 '25 13:06 picnixz

(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).

I think we could treat them in seperate issues and PRs, I think we could just focus on fixing allowed/blocked IPv6 domains in http.cookiejar now, and in the future we need to change a lot on that(I think I can handle this), including tests, docs, and logic everywhere. But yeah, I think we could just make sure the blocking and allowing work for IPv6 in this PR. I will make some changes based on what you've commented now.

Sorry but I don't have time for that.

I really thank you so much for your time. My coding skills and experiences have improved by all this.

LamentXU123 avatar Jun 21 '25 13:06 LamentXU123

No worries of set_ok_domain. This PR do not change any logic of it. We can change the logic in the future PRs. I think we could just focus on allowed/blocked issues in this PR now.

The only related change of other funcs is eff_request_host, which is used in set_ok_domain. But the only thing it did is just to treat IPv6 as a IP address and not adding .local on it. Since the original one don't support IPv6, this PR indeed also don't supprt IPv6 in the logic of set_ok_domain. We could take care of it later.

(There are indeed a hole bunch of things to change if we are going to support IPv6 entirely)

What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).

I'll take a look at this after this PR is merged.

LamentXU123 avatar Jun 21 '25 13:06 LamentXU123

I'm now convinced to only use [::1] strictly. RFC states that Literal IPv6 addresses are not supported as host name. So to be strict, only [::1] is a legel hostname while ::1 is a legel IPv6 address.

And ppl are more likely to use [::1] instead of ::1 when blocking domains. I think we just support [::1] and add warnings in the Docs or something. IMO we don't need to change any thing here, except of changing the is_ip_like func into is_ip_like_hostname func.

::1 is not a legel hostname, it's barely a IPv6 address. Instead, [::1] is a legel hostname, but not a legel IPv6 address. And what we're blocking are hostnames not IPs.

Note that IPv4 could be both a address and hostname. But IPv6 can't.

LamentXU123 avatar Jun 22 '25 08:06 LamentXU123

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

bedevere-app[bot] avatar Jun 22 '25 10:06 bedevere-app[bot]

changes were made

basically all done and ready now. Thank you for your time and effort!

LamentXU123 avatar Jun 22 '25 14:06 LamentXU123

The helper functions will help a lot in the future PRs I think.

LamentXU123 avatar Jun 22 '25 15:06 LamentXU123

last nits and we're done

Thanks! This is my first time ever to create a PR for Cpython with nearly a hundred lines of code. Thank you for your patience and guidence!

LamentXU123 avatar Jun 22 '25 15:06 LamentXU123

I'm mostly good but I'll ask another expert just in case @vadmium.

May I close the issue? I think its mostly solved.

LamentXU123 avatar Jun 22 '25 15:06 LamentXU123

No, we should never close an issue until the corresponding PR has been merged or we abandoned the proposal. In this case, closing the issue means that both the PR and its backports have been merged.

picnixz avatar Jun 22 '25 15:06 picnixz

No, we should never close an issue until the corresponding PR has been merged or we abandoned the proposal. In this case, closing the issue means that both the PR and its backports have been merged.

Got it!

LamentXU123 avatar Jun 22 '25 15:06 LamentXU123

I will also add warnings in docs indicating ::1 in not valid when they are blocking or allowing hostname after this is merged, also, remove the duplicated tests in another PR. IPv6 appears to be an edge case in many functions of http.cookiejar, I'll try to fix them one by one in the future.

LamentXU123 avatar Jun 23 '25 14:06 LamentXU123

Hi @picnixz, friendly pinging here. Shell we get this PR merged? caz I can't contribute to further following PRs and issues if this is not merged. TiA ;) (I think it is mostly solved, and tiny fixes can be coped later, if any.)

LamentXU123 avatar Jun 24 '25 09:06 LamentXU123

No. I want to hear from Martin first. PRs don't need to be rushed and it's common to have PRs landing after a few weeks.

You can still contribute to other parts of the library (which I would advise you to do so).

picnixz avatar Jun 24 '25 09:06 picnixz