rewrite icon indicating copy to clipboard operation
rewrite copied to clipboard

Wrong encoding for Java source files with ISO-8859-1

Open thomaszub opened this issue 3 years ago • 5 comments

Hi,

we have a massive problem with source file encoding. A lot of projects are encoded in ISO-8859-1. It seems that the encoding is not determined by e.g. a maven property but derived with org.openrewrite.internal.EncodingDetectingInputStream. This leads to wrong file changes. See as a example the project: https://github.com/thomaszub/rewrite-encoding-bug

I would suggest to derive the encoding from the build system like Maven's project.build.sourceEncoding and only use heuristics if no encoding can be derived.

Thanks and kind regards Thomas

thomaszub avatar Sep 15 '22 14:09 thomaszub

Hi @thomaszub,

Thanks for the demo project; I can appreciate the scale of this problem. Looking at the docs, we may need to consider both sources and resources.

pway99 avatar Sep 16 '22 22:09 pway99

Hi @pway99,

I made a PR which extends Parser, Parser.Input and EncodingDetectingInputStream with the possibility to set a Charset. This works with the maven-plugin if changed accordingly (I can make a PR for the maven-plugin). As I'm not familiar enough with the gradle-plugin or the non-Java parsers I would currently consider this PR as incomplete. But maybe this helps with fixing the problem.

thomaszub avatar Sep 20 '22 06:09 thomaszub

Hi Thomas, Thanks for the PR, Check out the JavaParser.Builder#charset it might simplify things a bit. I'm still unsure how to handle the EncodingDetectingInputStream when the charset is intentionally set, and I will be looking at this also. Perhaps we can team up on this one.

pway99 avatar Sep 20 '22 16:09 pway99

Hi Thomas, I have put up a rewrite-maven PR for setting the charset from mavens sourceEncoding property, and working on a rewrite-java solution that will use the maven-plugin charset when its specified.

pway99 avatar Sep 21 '22 01:09 pway99

Hi Patrick, if I correctly remind this is not enough as the JavaParser will not pass the encoding to the Input and EncodingDetectingInputStream. Maybe we should team up and look together at the code.

thomaszub avatar Sep 21 '22 06:09 thomaszub

Fixed by #2249

thomaszub avatar Sep 26 '22 08:09 thomaszub