robot icon indicating copy to clipboard operation
robot copied to clipboard

handling of special chars

Open cmrn-rhi opened this issue 3 years ago • 7 comments

Is ROBOT able to manage special characters in within template annotations?

We had a case of an input robot.tsv (UTF-8) containing chāt But the robot.ofn (UTF-8) output after running ROBOT had converted it to chÄ�t

FYI, the ROBOT template command was AL oboInOwl:hasExactSynonym@en SPLIT=|

cmrn-rhi avatar Oct 20 '22 18:10 cmrn-rhi

I would guess the problem is here: https://github.com/ontodev/robot/blob/810cc837fd157e572a1f143e740fce394b64742d/robot-core/src/main/java/org/obolibrary/robot/TemplateHelper.java#L850-L854

We need to use an input constructor that accepts a character set, rather than using the default platform character set.

balhoff avatar Oct 20 '22 18:10 balhoff

Thanks @balhoff. I was not able to replicate on macOS. I made my own little TSV:

ID	Synonym
ID	AL oboInOwl:hasExactSynonym@en SPLIT=|
obo:chat	chāt

and ran robot template -t test.tsv -o test.ofn, and I see chāt in the output file.

@cmrn-rhi What operating system are you using?

jamesaoverton avatar Oct 20 '22 18:10 jamesaoverton

@jamesaoverton Windows 10 (version 21H2) Sorry, should've mentioned it in the first place.

cmrn-rhi avatar Oct 20 '22 19:10 cmrn-rhi

Thanks @cmrn-rhi!

@balhoff Do you see a way to do this in a backwards-compatible way?

jamesaoverton avatar Oct 24 '22 20:10 jamesaoverton

Not offhand, but personally I think we can just call it a bug that needs to be fixed. I know you like to be more conservative than that!

balhoff avatar Oct 24 '22 20:10 balhoff

This article says that UTF-8 is the default on all platforms (including Window) starting with JDK 18 https://medium.com/@andbin/jdk-18-and-the-utf-8-as-default-charset-8451df737f90. These are the notes about that change for OpenJDK: https://openjdk.org/jeps/400.

It also says that you can use this option to override the default file encoding: java -Dfile.encoding=UTF-8.

@cmrn-rhi Could you please try running that same template with that option set, and see if it resolves this issue? Something like java -Dfile.encoding=UTF-8 -jar robot.jar template ...

jamesaoverton avatar Oct 26 '22 14:10 jamesaoverton

Just gave it a go and that did work!

cmrn-rhi avatar Oct 26 '22 17:10 cmrn-rhi