jpilot icon indicating copy to clipboard operation
jpilot copied to clipboard

It causes data loss if you enter certain character sequences into a memo. Here's a suggested workaround.

Open unforgettableid opened this issue 8 years ago • 0 comments

Background

Years ago, @hleskien reported a J-Pilot bug to the Debian issue tracker: http://bugs.debian.org/520135.

The bug was marked as fixed around the time that commit 3d0287a by @LudovicRousseau made it into J-Pilot.

The problem

There's some interesting behavior in the current latest J-Pilot mainline.

  1. If you write "the paper by P‐O Stotzer" in a memo and click Apply, J-Pilot converts the Unicode hyphen to question marks, as expected. (The hyphen in "P‐O Stotzer" is a U+2010 narrow Unicode hyphen. It's not an ASCII hyphen. I copied and pasted it from a scholarly article. I guess the publishers like narrow hyphens.)
  2. If you write "the paper by M Simren and P‐O Stotzer" in a memo and click Apply, J-Pilot converts the Unicode hyphen to question marks, as expected.
  3. If you write "the paper by M Simrén and P‐O Stotzer" in a memo and click Apply, J-Pilot truncates the entire rest of the memo which comes after the Unicode hyphen.

Really, in general: Any memo which contains the accented letter 'é', later followed by the narrow Unicode hyphen '‐', causes truncation and data loss.

Even if there are thousands of words of other text between the 'é' and the '‐', the truncation and data loss will still happen.

This data loss is permanent and maybe irreversible. It's a problem. I don't know why it happens.

I lost a chunk of a long memo because of the truncation. Maybe hundreds of words. I copied them from various sources online. Maybe I'll go through my browser history and collect the information again.

There are other ways to reproduce the problem. Writing "the paper by M Simrén and P‐O Stotzer" is just one of them. And I bet there are other possible character combinations which can cause the same problem.

The cause of the bug

I theorize that the bug is caused by some bug in otherconv.c, maybe around line 315 or later.

You could try to fix the bug directly. But writing bug-free C code can be difficult. Maybe any fix might contain more bugs. :) Since I bet nobody is willing to do lots of testing (and fuzz testing) of any fix, I think it'd be best to work around the bug instead.

Below, I shall propose a glibc-only workaround. Even though it's glibc-only, I think it's the best fix, because it relies on well-tested glibc code to solve the problem. I bet most J-Pilot users use glibc anyway.

The workaround

otherconv.c includes a function named otherconv_init(). The function calls GLib's g_iconv_open function twice. And g_iconv_open, in turn, calls the local libc's iconv_open function.

The workaround is, each time you call g_iconv_open, to append "//TRANSLIT//IGNORE" to the end of the second argument.

I tested my workaround on Ubuntu 16.04 on the Windows Linux Subsystem, and it seems to fix the problem perfectly. Once I applied my workaround and recreated a problematic memo, there was no data loss anymore.

Please note: "//TRANSLIT//IGNORE" is a GNU extension which isn't available on all platforms. My workaround is only guaranteed to work if you're using glibc. If you're using any other libc, it may cause problems.

I think it'd be best to test for __GLIBC__, and to apply the workaround only if it's defined.

Postscript

Thank you again for your continued work on J-Pilot! Despite the occasional bug or two, it's been a big help to me overall.

unforgettableid avatar Aug 10 '17 02:08 unforgettableid