ISAcreator icon indicating copy to clipboard operation
ISAcreator copied to clipboard

Does not ensure that ISA-Tabs are in utf-8

Open drj11 opened this issue 7 years ago • 4 comments

(as discussed face-to-face on 2017-02-05)

What originated as a bug in our data repository, https://github.com/hidelab/genometranslationcommons/issues/14, turns out to be because the ISA-Tab created by ISAcreator wasn't encoded in utf-8 and that turned out to be because Java on Windows doesn't particularly guarantee that.

drj11 avatar Feb 09 '18 13:02 drj11

Solutions that would improve the matter:

  1. The Java program could override the system locale and just write everything in utf-8.
  2. Warn when saving if the encoding isn't utf-8.
  3. Selectable from a menu.

I think I strongly prefer 1 (are there users that want ISA-Tab files that aren't utf-8?).

It's possible that we could achieve that by setting an option when java is invoked. My brief research leads me to https://groups.google.com/forum/#!topic/isaforum/03P91ZQ1mj0 which suggests -Dfile.encoding=utf-8.

But I don't know how Java programs are packaged or launched.

drj11 avatar Feb 09 '18 13:02 drj11

on Windows you can write a shell script to start up ISACreator using UTF-8 encodings. Here's the one I wrote:

echo. Starting ISAcreator java -Dfile.encoding=utf-8 -Xmx1024m -Xms512m -jar ISAcreator.jar

DanBerrios avatar Mar 07 '18 22:03 DanBerrios

ah thanks @DanBerrios, I don't use Windows myself but this will be useful

drj11 avatar Mar 08 '18 09:03 drj11

an example unicode character: ≤

as in 5 ≤ 5

drj11 avatar Mar 13 '18 16:03 drj11