asciidoctor-maven-plugin
asciidoctor-maven-plugin copied to clipboard
problem file encoding (Umlaute) for external PlantUML diagrams
- [x] Bug report
- [ ] Feature request
- [x] Question
I am not sure, if this is the right place or the asciidoctor-diagram project. So hopefully here is the right one.
My maven projects source code is / should be completely UTF-8. Now I want to build a maven site and the pages should be asciidoctor files and integrate an PlantUML diagram, which comes from a file. This diagram is generated but seems to have always the wrong encoding but the internal diagrams are correct.
So how do I tell asciidoctor, that this diagram files should be UTF-8?
What I did / tried so far:
- changed file.encoding while starting maven (-Dfile.encoding=UTF-8)
- defined project source encoding in maven
- defined project reporting encoding in maven
- different Java versions
- tried to configure default_external parameter, which had no effect
- changed defined project encodings, to get some change
BTW my environment is Windows 11, Java 8, 11, 17, Maven 3.6, 3.8.
I attached a minimal maven project (asciidoctor1.zip) . Just run site:site or look into the target directory I sent.
Look into target/site directory:
-
diag-....png is correct. It is defined using UTF-8 in overview.adoc
-
test_class_utf8.png is wrong. It is defined using UTF-8 in test_class_utf8.puml
-
test_class_cp1252.png is correct. It is defined using CP1252 in test_class_cp1252.puml
So it seems that asciidoctor (diagrams) tries to always use Cp1252 for external PlantUML files, which is strange, since I already reset file encoding to UTF-8.
So what did I wrong?
There's something here, but I need to setup a Windows vm, so it may take some extra time to answer.
Files should already be UTF-8, Asciidoctor does not understand other encodings, and in non-Win OSs the example just crashes when processing the cp1252 file. Why in Windows cp1252 works and utf-8 is what I need to research, we only use project.build.sourceEncoding to copy resources which you don't do in the example.
I understand that the end goal is to have all files in UTF-8 right? mixing encodings is not going to work ever.
Right. All should be UTF-8. I just included this cp1252 to test and got lucky. However using ISO-8859-1 works as well, same encoding at least for those characters.
If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?
If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?
Yes.
In fact non-Windows (testing MacOs now) totally crash with org.jruby.exceptions.ArgumentError: (ArgumentError) asciidoctor: FAILED: <stdin>: Failed to load AsciiDoc document - invalid byte sequence in UTF-8
. That's a common thing for ppl to ask about asciidoctor, you can find several reports googling for it.
That's why I am pluzzed that you get the opposite effect and need to do research. I know Windows does not crash, but using cp1252 as default 🤔
Strange. This should be the same as starting java with -Dfile.encoding=UTF-8. Is there another instance of JVM started somehow in the rendering process? At the moment in windows Cp1252 is the standard encoding in Java but in Linux and MacOs its UTF-8.