Normalize URI
The common case of setting URI on lsp4j message object is URI#toString() call. For example LSP4E sets uri on the TextDocumentIdentifier via URI#toString() call rather than URI#toASCIIString(). Therefore if the file name is ClassWithSpécialCharacter.java then only URI#toASCIIString() would produce the valid URI with escaped non-ASCII char é
It'd be great if TextDocumentIdentifier (and perhaps other classes) taking string URIs would normalize the input URI to guard against the issue described above.
In bsl language server we've created a helper class for handling non-canonical uri: https://github.com/1c-syntax/utils/blob/master/src/main/java/com/github/_1c_syntax/utils/Absolute.java
Yes, we could do this as well (thanks for the sharing your solution!) but this would require a sweep kind of change, ie. URI.create(strUri) vs Absolute.uri(strUri). The point is to try to avoid a code sweep change if possible... sort of to guard against further use of URI#toString() (or URI.create(String) vs Absolute.uri(String))
@BoykoAlex Are you recommending adding a constructor to TextDocumentIdentifier that takes a java.net.URI (in addition to a String)? Or do you want TextDocumentIdentifier to escape the input string? Can the latter part be done reliably?
@jonahgraham I'd add a constructor that takes java.net.URI which would call uri.toASCIIString() to set the string uri field. In addition I'd make the constructor taking String creating a java.net.URI from string and then delegate to the constructor taking java.net.URI. It'd be great to do the same to setUri(...) unless this seems risky...
@jonahgraham I'd add a constructor that takes
java.net.URIwhich would calluri.toASCIIString()to set the stringurifield.
This part seems fine.
In addition I'd make the constructor taking
Stringcreating ajava.net.URIfrom string and then delegate to the constructor takingjava.net.URI.
This part worries me because of double-escaping. But then looking at your above comment that would still need a sweeping change if we didn't adopt this part.
It'd be great to do the same to
setUri(...)unless this seems risky...
I don't see any issue with adding a new setUri(java.net.URI) method.
However having a getUri... that returns a java.net.URI seems problematic because its constructor throws URISyntaxException.
I guess the real question is why not have TextDocumentIdentifier store a java.net.URI all the time? Is there some things that LSP URIs can store (as a string) that can't be represented properly by java.net.URI?
I think having java.net.URI rather than String would be ideal... Somehow, I suspect it is String due to perhaps some JSON serialization/deserialization limitations?