asciidoctor-maven-plugin icon indicating copy to clipboard operation
asciidoctor-maven-plugin copied to clipboard

problem file encoding (Umlaute) for external PlantUML diagrams

Open wumpz opened this issue 2 years ago • 4 comments

  • [x] Bug report
  • [ ] Feature request
  • [x] Question

I am not sure, if this is the right place or the asciidoctor-diagram project. So hopefully here is the right one.

My maven projects source code is / should be completely UTF-8. Now I want to build a maven site and the pages should be asciidoctor files and integrate an PlantUML diagram, which comes from a file. This diagram is generated but seems to have always the wrong encoding but the internal diagrams are correct.

So how do I tell asciidoctor, that this diagram files should be UTF-8?

What I did / tried so far:

  1. changed file.encoding while starting maven (-Dfile.encoding=UTF-8)
  2. defined project source encoding in maven
  3. defined project reporting encoding in maven
  4. different Java versions
  5. tried to configure default_external parameter, which had no effect
  6. changed defined project encodings, to get some change

BTW my environment is Windows 11, Java 8, 11, 17, Maven 3.6, 3.8.

I attached a minimal maven project (asciidoctor1.zip) . Just run site:site or look into the target directory I sent.

Look into target/site directory:

  • diag-....png is correct. It is defined using UTF-8 in overview.adoc image

  • test_class_utf8.png is wrong. It is defined using UTF-8 in test_class_utf8.puml image

  • test_class_cp1252.png is correct. It is defined using CP1252 in test_class_cp1252.puml image

So it seems that asciidoctor (diagrams) tries to always use Cp1252 for external PlantUML files, which is strange, since I already reset file encoding to UTF-8.

So what did I wrong?

wumpz avatar Jul 15 '22 05:07 wumpz

There's something here, but I need to setup a Windows vm, so it may take some extra time to answer.

Files should already be UTF-8, Asciidoctor does not understand other encodings, and in non-Win OSs the example just crashes when processing the cp1252 file. Why in Windows cp1252 works and utf-8 is what I need to research, we only use project.build.sourceEncoding to copy resources which you don't do in the example.

I understand that the end goal is to have all files in UTF-8 right? mixing encodings is not going to work ever.

abelsromero avatar Jul 15 '22 07:07 abelsromero

Right. All should be UTF-8. I just included this cp1252 to test and got lucky. However using ISO-8859-1 works as well, same encoding at least for those characters.

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

wumpz avatar Jul 15 '22 09:07 wumpz

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

Yes. In fact non-Windows (testing MacOs now) totally crash with org.jruby.exceptions.ArgumentError: (ArgumentError) asciidoctor: FAILED: <stdin>: Failed to load AsciiDoc document - invalid byte sequence in UTF-8. That's a common thing for ppl to ask about asciidoctor, you can find several reports googling for it.

That's why I am pluzzed that you get the opposite effect and need to do research. I know Windows does not crash, but using cp1252 as default 🤔

abelsromero avatar Jul 15 '22 09:07 abelsromero

Strange. This should be the same as starting java with -Dfile.encoding=UTF-8. Is there another instance of JVM started somehow in the rendering process? At the moment in windows Cp1252 is the standard encoding in Java but in Linux and MacOs its UTF-8.

wumpz avatar Jul 15 '22 10:07 wumpz