textidote
textidote copied to clipboard
No warnings found in CI
There seems to be an issue where no warnings are found when using the tool in CI.
Running java -jar textidote.jar --check de --output html PuE.tex > language_report.html
locally
in the root of the repo works and warnings are found. In the CI log, it does not state that the file was skipped, but no warnings are found either.
Explicitly using the correct file name "PuE.tex" in the CI script instead of the variable has no effect.
Could it be a file encoding problem? I recall this issue from last year:
https://github.com/sylvainhalle/textidote/issues/120#issuecomment-613433539
If the encoding of the file does not match what TeXtidote expects, nothing is being read and that would explain the absence of warnings.
To debug the issue, may I suggest you try this command:
java -jar textidote.jar --clean --check de --output html PuE.tex > cleaned.txt
If cleaned.txt
is empty, then we'd have a hint about what is going on.
I just created a minimal example for this issue in this repository. The pipeline of this repository generated the following artifacts. These artifacts.zip were generated on my local system (Ubuntu in WSL). Notably, in the cleaned.txt generated by GitLab CI, the characters 'ü' and 'ß' have been replaced with '?' whereas this did not happen in the locally generated version.
Thanks for providing these artifacts. I opened the files in a hex editor to see how the characters have been encoded. Here is what I found:
Source file (main.tex
):
- ü:
C3BC
-> UTF-8 - ß:
C39F
-> UTF-8
CI (cleaned.txt
):
- ü:
3F
-> "?" - ß:
3F
-> "?"
Local (cleaned.txt
):
- ü:
FC
-> latin-1 - ß:
DF
-> latin-1
I am a bit puzzled by what I see. The source file is a valid UTF-8 document. When processed locally, it ends up as a file transcoded into latin-1 (visible by the fact that the two characters end up with a different hex value). I don't know how this is possible, as TeXtidote always assumes the default encoding of the OS it runs in. Finally, when it is run in the CI pipeline, the characters are garbled --indicating again that the program does not assume UTF-8 as the input encoding. However, looking at your CI configuration, I see that you use a Debian OS, so UTF-8 input should not be a problem.
A workaround for your problem would be to explicitly TeXtidote to use UTF-8, by adding the --encoding UTF-8
command line switch when you call it. Tell me if this changes something.
Thanks for the help so far! As suggested, I added --encoding UTF-8
parameter in the CI script.
However, this did not affect the resulting CI artifacts.
To confirm that the main.tex is not altered by Git in some unexpected way when pushing / pulling, I also tried downloading my local main.tex version as part of the pipeline on another branch which yielded the same results.
This may not be related, but I see that the calls to TeXtidote mix the --clean
option with the --check
option. These two are mutually exclusive: calling clean
only cleans the document and exits before performing any other verification.
@giulianorasper Did you find a solution?
Will close this due to lack of information to fix the issue.