error-prone icon indicating copy to clipboard operation
error-prone copied to clipboard

Avoid using non-ASCII Unicode characters outside of comments and literals

Open codefish1 opened this issue 2 years ago • 11 comments

In error-prone 2.11.0 I've started getting the following error when building within IntelliJ

Foo.java:17:2
java: [UnicodeInCode] Avoid using non-ASCII Unicode characters outside of comments and literals, as they can be confusing.
    (see https://errorprone.info/bugpattern/UnicodeInCode)

When I view the file in VIM or HexDump there I can't see any non-unicode characters.

Line 17 is the end of the file, I can't supply the whole file due to work constraints. But below is a screenshot of the end of the file from hexedit image

Within IntelliJ the formatter is doing image

If I down grade error-prone to 2.10.0 it works fine on the offending file

codefish1 avatar Apr 08 '22 17:04 codefish1

I think I've seen this a couple of times and hadn't got to the bottom of it yet.

To make it easier to debug, maybe we should improve the diagnostic to mention which non-unicode characters it thinks it's seeing.

cushon avatar Apr 08 '22 17:04 cushon

Playing with the existing test, to add an assertion on the error and I noticed it already outputs the line in error along with a ^ pointing at the character in error. But I don't get that in these cases Screenshot from 2022-04-08 19-56-01

codefish1 avatar Apr 08 '22 18:04 codefish1

AFAICT, because 99.9% of Java code is plain ASCII, the check is rather "dumb" and doesn't try to only flag problematic chars.

tbroyer avatar Apr 08 '22 19:04 tbroyer

I think it's a bug which appears when running in IntelliJ

Using a file which fails in IntelliJ (2021.3.2 (Ultimate Edition)) the following test using the command line from the installation docs works. In addition a mvn compile on the command line works

javac \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
  -XDcompilePolicy=simple \
  -processorpath error_prone_core-2.11.0-with-dependencies.jar:dataflow-errorprone-3.15.0.jar \
  '-Xplugin:ErrorProne -XepDisableWarningsInGeneratedCode -XepExcludedPaths:.*/target/generated-sources/.*' \ 
  filename.java

I've also copied the failing file to one side and done a diff to see it's the same as the failing one. Played about with the file a few times (adding and removing the last line) until it works and done a diff again. The diff shows no difference in the files.

codefish1 avatar Apr 08 '22 21:04 codefish1

I wonder if IntelliJ is adding a unicode character to the buffer for some reason.

I'm going to update the diagnostic message to print the character it's seeing, which might help debug this.

cushon avatar Apr 13 '22 20:04 cushon

FYI, there is an issue filed on the IntelliJ side, too -- https://youtrack.jetbrains.com/issue/IDEA-288257

elefeint avatar May 20 '22 15:05 elefeint

I've found the cause: Javac modifies content of file passed to it as char[] (see UnicodeReader.java:103) by replacing the last character by 0x1a. If this array is cached (the original implementation of Javac also does that, but code in intellij does this in a different way to improve performance), Error Prone may get this modified content and report an error. Note that this code in Javac was rewritten as part of JDK-8224225, so the problem shouldn't appear in Java 16 and newer versions.

chashnikov avatar Aug 10 '22 16:08 chashnikov

I'm not sure how we can fix this on intellij side. We implement javax.tools.FileObject#getCharContent and cache content of the returned CharSequence, it's really unexpected that code in Javac casts the returned value to CharBuffer and modifies its content. Maybe this can be fixed in Error Prone? I think ignoring 0x1a symbol if it's the last character in the file text is a good workaround, I doubt that any real problems will be masked by such change.

chashnikov avatar Aug 10 '22 16:08 chashnikov

@chashnikov FWIW, I still have this issue in Java 18 (Zulu) in Intellij.

lwhite1 avatar Sep 30 '22 14:09 lwhite1

Since this has been merged but is still open, can someone update this with the version where the fix will appear?

lwhite1 avatar Oct 13 '22 18:10 lwhite1

This should have been included in the recent 2.16.0 release

cushon avatar Oct 13 '22 18:10 cushon

FYI, I still see this on occasion in 2.16. Seems to be less common.

kenfreeman avatar Nov 08 '22 17:11 kenfreeman