jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8315767: InetAddress: constructing objects from BSD literal addresses

Open sercher opened this issue 10 months ago • 28 comments

There are two distinct approaches to parsing IPv4 literal addresses. One is the Java baseline "strict" syntax (all-decimal d.d.d.d form family), another one is the "loose" syntax of RFC 6943 section 3.1.1 [1] (POSIX inet_addr allowing octal and hexadecimal forms [2]). The goal of this PR is to provide interface to construct InetAddress objects from literal addresses in POSIX form, to applications that need to mimic the behavior of inet_addr used by standard network utilities such as netcat/curl/wget and the majority of web browsers. At present time, there's no way to construct InetAddress object from such literal addresses because the existing APIs such as InetAddress.getByName(), InetAddress#ofLiteral() and Inet4Address#ofLiteral() will consume an address and successfully parse it as decimal, regardless of the octal prefix. Hence, the resulting object will point to a different IP address.

Historically InetAddress.getByName()/.getAllByName() were the only way to convert a literal address into an InetAddress object. getAllByName() API relies on POSIX getaddrinfo / inet_addr which parses IP address segments with strtoul (accepts octal and hexadecimal bases).

The fallback to getaddrinfo is undesirable as it may end up with network queries (blocking mode), if inet_addr rejects the input literal address. The Java standard explicitly says that

"If a literal IP address is supplied, only the validity of the address format is checked."

@AlekseiEfimov contributed JDK-8272215 [3] that adds new factory methods .ofLiteral() to InetAddress classes. Although the new API is not affected by the getaddrinfo fallback issue, it is not sufficient for an application that needs to interoperate with external tooling that follows POSIX standard. In the current state, InetAddress#ofLiteral() and Inet4Address#ofLiteral() will consume the input literal address and (regardless of the octal prefix) parse it as decimal numbers. Hence, it's not possible to reliably construct an InetAddress object from a literal address in POSIX form that would point to the desired host.

It is proposed to extend the factory methods with Inet4Address#ofPosixLiteral() that allows parsing literal IP(v4) addresses in "loose" syntax, compatible with inet_addr POSIX api. The implementation is based on .isBsdParsableV4() method added along with JDK-8277608 [4]. The changes in the original algorithm are as follows:

  • IPAddressUtil#parseBsdLiteralV4() method is extracted from .isBsdParsableV4()
  • an additional check is added, whether an input string is empty
  • null is returned whenever the original algorithm fails
  • a condition was added to the parser loop, that stores the IPv4 address segment value when it fits in 1 byte (0 <= x < 256)
  • an additional check was added to verify that the last field value is non-negative
  • when the last field value is multi-byte (the number of fields is less than 4), it is written to the last (4-(fieldNumber-1)) octets

The new method hasn't been added to InetAddress superclass because the change is only related to IPv4 addressing. This reduces the chance that client code will call the wrong factory method.

test/jdk/java/net/InetAddress/OfLiteralTest.java was updated to include .ofPosixLiteral() tests

Javadocs in Inet4Address were updated accordingly

The new method can be used as follows

import java.net.InetAddress;
import java.net.Inet4Address;

public class Test {
    public static void main(String[] args) throws Throwable {
        if (args.length < 1) {
            System.err.println("USAGE: java Test <host>");
            return;
        }
        InetAddress ia = Inet4Address.ofPosixLiteral(args[0]);
        System.out.println(ia.toString());
    }
}

The output would be

$ ./build/images/jdk/bin/java Test 2130706433
/127.0.0.1
$ ./build/images/jdk/bin/java Test 02130706433
/17.99.141.27
$ ./build/images/jdk/bin/java Test 2130706438
/127.0.0.6
$ ./build/images/jdk/bin/java Test 02130706438
Exception in thread "main" java.lang.IllegalArgumentException: Invalid IP address literal: 02130706438
        at java.base/sun.net.util.IPAddressUtil.invalidIpAddressLiteral(IPAddressUtil.java:169)
        at java.base/java.net.Inet4Address.parseAddressStringPosix(Inet4Address.java:302)
        at java.base/java.net.Inet4Address.ofPosixLiteral(Inet4Address.java:239)
        at Test.main(Test.java:10)

[1] https://www.ietf.org/rfc/rfc6943.html#section-3.1.1 [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html [3] https://bugs.openjdk.org/browse/JDK-8272215 [4] https://github.com/openjdk/jdk/commit/cdc1582d1d7629c2077f6cd19786d23323111018


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [ ] Change requires CSR request JDK-8329876 to be approved
  • [x] Commit message must refer to an issue

Issues

  • JDK-8315767: InetAddress: constructing objects from BSD literal addresses (Enhancement - P3)
  • JDK-8329876: InetAddress: constructing objects from BSD literal addresses (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18493/head:pull/18493
$ git checkout pull/18493

Update a local copy of the PR:
$ git checkout pull/18493
$ git pull https://git.openjdk.org/jdk.git pull/18493/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18493

View PR using the GUI difftool:
$ git pr show -t 18493

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18493.diff

Webrev

Link to Webrev Comment

sercher avatar Mar 26 '24 17:03 sercher

:wave: Welcome back schernyshev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Mar 26 '24 17:03 bridgekeeper[bot]

@sercher This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8315767: InetAddress: constructing objects from BSD literal addresses

Reviewed-by: dfuchs, aefimov, michaelm, jpai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 726 new commits pushed to the master branch:

  • 2a11e0da026066191e4d4f30b9daca986c484630: 8332743: Update comment related to JDK-8320522
  • 6829d9ac67fb131462d3ef1c4bdfaa07df5d6be6: 8332122: [nmt] Totals for malloc should show total peak
  • 9d332e6591334a71335da65a4dd7b2ed0482b6cb: 8307193: Several Swing jtreg tests use class.forName on L&F classes
  • 98f6a80852383dcbdad7292b7d269a8547d54d45: 8332490: JMH org.openjdk.bench.java.util.zip.InflaterInputStreams.inflaterInputStreamRead OOM
  • 3d4185a9ce482cc655a4c67f39cb2682b02ae4fe: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed
  • c4557a7b0db5b55585b4caa7cdec81e1c1093cbc: 8332463: Byte conditional pattern case element dominates short constant case element
  • d59c12fe1041a1f61f68408241a9aa4d96ac4fd2: 8329718: Incorrect @since tags in elements in jdk.compiler and java.compiler
  • b4d14540851d792b5366a3723abcea1264a5737c: 8332740: [BACKOUT] JDK-8331081 'internal proprietary API' diagnostics if --system is configured to an earlier JDK version
  • 37c477856d543163b60dd2b85a5e6ac35a752211: 8332096: hotspot-ide-project fails with this-escape
  • 2170e99cb49a4ef2086ecec7515a72d56148d0f2: 8331081: 'internal proprietary API' diagnostics if --system is configured to an earlier JDK version
  • ... and 716 more: https://git.openjdk.org/jdk/compare/153410f480c372604e5825bfcd3a63f137e6a013...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dfuch, @AlekseiEfimov, @Michael-Mc-Mahon, @jaikiran) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk[bot] avatar Mar 26 '24 17:03 openjdk[bot]

@sercher The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Mar 26 '24 17:03 openjdk[bot]

/csr needed

AlekseiEfimov avatar Mar 26 '24 17:03 AlekseiEfimov

@AlekseiEfimov has indicated that a compatibility and specification (CSR) request is needed for this pull request.

@sercher please create a CSR request for issue JDK-8315767 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.

openjdk[bot] avatar Mar 26 '24 17:03 openjdk[bot]

Should we be setting any expectations by specifying what InetAddress.getHostAddress() will return for an Inet4Address constructed using this new Inet4Address.ofPosixLiteral() method? In its current form I believe it will continue to return a decimal representation of the IP address. My guess is that we want it to continue behaving that way?

jaikiran avatar Apr 17 '24 14:04 jaikiran

Should we be setting any expectations by specifying what InetAddress.getHostAddress() will return for an Inet4Address constructed using this new Inet4Address.ofPosixLiteral() method? In its current form I believe it will continue to return a decimal representation of the IP address. My guess is that we want it to continue behaving that way?

In the InetAddress class level API documentation we have:

 * <p> For methods that return a textual representation as output
 * value, the first form, i.e. a dotted-quad string, is used.

Do you think it should be reiterated in the @apiNote of the ofPosixLiteral method?

dfuch avatar Apr 17 '24 14:04 dfuch

Should we be setting any expectations by specifying what InetAddress.getHostAddress() will return for an Inet4Address constructed using this new Inet4Address.ofPosixLiteral() method? In its current form I believe it will continue to return a decimal representation of the IP address. My guess is that we want it to continue behaving that way?

In the InetAddress class level API documentation we have:

 * <p> For methods that return a textual representation as output
 * value, the first form, i.e. a dotted-quad string, is used.

Do you think it should be reiterated in the @apiNote of the ofPosixLiteral method?

The changes proposed in this PR introduce a new paragraph in the class level documentation just before the line which states the dotted-quad string. So I think it may not be clear enough whether dotted-quad string implies decimal values (127.0.0.1) or octal dotted-quad string (0177.0.0.1) or hexadecimal dotted-quad string (0x7F.0.0.1). So I think updating that sentence in class level documentation might help.

jaikiran avatar Apr 17 '24 14:04 jaikiran

The changes proposed in this PR introduce a new paragraph in the class level documentation just before the line which states the dotted-quad string. So I think it may not be clear enough whether dotted-quad string implies decimal values (127.0.0.1) or octal dotted-quad string (0177.0.0.1) or hexadecimal dotted-quad string (0x7F.0.0.1). So I think updating that sentence in class level documentation might help.

Good point.

dfuch avatar Apr 17 '24 14:04 dfuch

Should we be setting any expectations by specifying what InetAddress.getHostAddress() will return for an Inet4Address constructed using this new Inet4Address.ofPosixLiteral() method? In its current form I believe it will continue to return a decimal representation of the IP address. My guess is that we want it to continue behaving that way?

That seems to be the case for the native function roundtrip. So, I would it expect it to be the same in Java.

Michael-Mc-Mahon avatar Apr 18 '24 09:04 Michael-Mc-Mahon

which lists the methods that parse as decimal only, and the new method which parses using the "loose" syntax.

That might not be practical: the only method that supports non decimal form is the new ofBSDLiteral. But any other method only support decimal notation. These include URL/URI, permissions, InetSocketAdderess, InetAddress, Socket/DatagramSocket constructors, etc... Building a comprehensive list would be both unpractical and difficult.

dfuch avatar Apr 18 '24 11:04 dfuch

which lists the methods that parse as decimal only, and the new method which parses using the "loose" syntax.

That might not be practical: the only method that supports non decimal form is the new ofBSDLiteral. But any other method only support decimal notation. These include URL/URI, permissions, InetSocketAdderess, InetAddress, Socket/DatagramSocket constructors, etc... Building a comprehensive list would be both unpractical and difficult.

I think we only need to talk about the methods in InetAddress. Though it could be phrased as ofPosixLiteral supports decimal, hex and octal. All other parsing is decimal only.

Michael-Mc-Mahon avatar Apr 18 '24 13:04 Michael-Mc-Mahon

I think we only need to talk about the methods in InetAddress. Though it could be phrased as ofPosixLiteral supports decimal, hex and octal. All other parsing is decimal only.

Yes. that latter formulation might be better.

dfuch avatar Apr 18 '24 14:04 dfuch

Hello Sergey, do you plan to update the PR with the class level doc change suggested by Michael?

jaikiran avatar May 06 '24 10:05 jaikiran

Hello Sergey, do you plan to update the PR with the class level doc change suggested by Michael?

Hi @jaikiran ! Yes I plan to update the PR. Apologies for the delay.

sercher avatar May 06 '24 17:05 sercher

Thanks @Michael-Mc-Mahon, @jaikiran. I updated the class level doc so that the sentence "These forms support parts in decimal format only" is completely removed (it wasn't looking good to me any more because it was repeated by the next sentence "For example..."). I didn't add the sub-section "Parsing of literal addresses" because it already belongs to "Textual representation of IPv4 addresses" which is more or less the same. Please see the updated class level doc.

sercher avatar May 06 '24 18:05 sercher

@AlekseiEfimov @jaikiran @Michael-Mc-Mahon Could you please take a look and review the CSR ?

sercher avatar May 07 '24 09:05 sercher

@AlekseiEfimov @jaikiran @Michael-Mc-Mahon Could you please take a look and review the CSR ?

I will take a look at it. Thanks for the reminder

AlekseiEfimov avatar May 07 '24 10:05 AlekseiEfimov

Thanks @Michael-Mc-Mahon, @jaikiran. I updated the class level doc so that the sentence "These forms support parts in decimal format only" is completely removed (it wasn't looking good to me any more because it was repeated by the next sentence "For example..."). I didn't add the sub-section "Parsing of literal addresses" because it already belongs to "Textual representation of IPv4 addresses" which is more or less the same. Please see the updated class level doc.

Ok - so now we seem to be missing the fact that the only form supported by the existing methods is the decimal form. Can we find a wording that makes this clear?

dfuch avatar May 09 '24 09:05 dfuch

Ok - so now we seem to be missing the fact that the only form supported by the existing methods is the decimal form. Can we find a wording that makes this clear?

Would you think this should be mentioned in the class level doc of Inet4Address class? Alternatively, the method description of ofPosixLiteral could say that "this is the only method accepting non-decimal forms of address literals". There's no such thing as "existing methods", there are "all methods consuming address literals with an exception of ofPosixLiteral" - this could be a phrasing in the class level doc. Please let me know what you think.

sercher avatar May 13 '24 14:05 sercher

missing the fact that the only form supported

I'd also note that, when mentioned in the docs that this is the only method capable of parsing this type of strings, it can be limiting to the future APIs that could introduce other methods that accept this type of strings. Adding such methods in the future will invalidate this statement and will require editing the class specification again. @dfuch Could you please help reviewing the CSR?

sercher avatar May 15 '24 00:05 sercher

Thanks - LGTM now. @AlekseiEfimov, @Michael-Mc-Mahon are you also good with this version of the changes?

dfuch avatar May 16 '24 10:05 dfuch

Thank you @AlekseiEfimov, @dfuch for your reviews!

sercher avatar May 16 '24 15:05 sercher

I'd just like to have one more look at it today, if that's okay.

Michael-Mc-Mahon avatar May 17 '24 09:05 Michael-Mc-Mahon

I read through the API docs in the latest revision (664fccb1) and it looks very good. The new method is not easily discovered, which is okay as it's very much a method for experts, I see wonder if we should put a @see in InetAddress.ofLiteral.

AlanBateman avatar May 17 '24 10:05 AlanBateman

The new method is not easily discovered, which is okay as it's very much a method for experts, I see wonder if we should put a @see in InetAddress.ofLiteral.

Thanks @AlanBateman for your note. There's a link to ofPosixLiteral from the class level doc. I didn't want to touch ofLiteral as it already has the link to the #format subtitle in the class doc

sercher avatar May 17 '24 11:05 sercher

Thanks @AlanBateman for your note. There's a link to ofPosixLiteral from the class level doc. I didn't want to touch ofLiteral as it already has the link to the #format subtitle in the class doc

I think you may have mis-read my comment. I think we need to add @see InetAddress4#ofPosixLiteral(String) to InetAddress.ofLiteral, not Inet4Address.ofLiteral. The reason is that methods defined by Inet4Address are not very discoverable and there is no reference to ofPosixLiteral in InetAddress's class description or anywhere else.

AlanBateman avatar May 19 '24 14:05 AlanBateman

I think you may have mis-read my comment. I think we need to add @see InetAddress4#ofPosixLiteral(String) to InetAddress.ofLiteral, not Inet4Address.ofLiteral. The reason is that methods defined by Inet4Address are not very discoverable and there is no reference to ofPosixLiteral in InetAddress's class description or anywhere else.

Thanks @AlanBateman , apologize for mis-reading your comment. I added @see to InetAddress.ofLiteral, please see the updated doc.

sercher avatar May 20 '24 09:05 sercher

/integrate

sercher avatar May 21 '24 07:05 sercher