RapidCRC-Unicode icon indicating copy to clipboard operation
RapidCRC-Unicode copied to clipboard

bug: v0.3.22: file paths with special characters, like German "umlaut", are marked as Error or File not found

Open kjeasy opened this issue 8 years ago • 4 comments

When verifying files using an existing .md5 file, if there is a German Umlaut in a file path (directory or file name) the file is marked as "Error", or "File not found". Program version 0.3.22

I have attached

  • a screen shot of two instances of the program, the upper instance used to create an md5 file, and the green rectangle shows the correctly shown umlaut, the lower instance showing the errors, and the red rectangle the incorrectly read file names.
  • the directory with the files themselves, and the md5 file

The md5 file itself looks good, but when reading it, the program seems to have a problem with the umlaut. Note: there is only one umlaut (German ae = ä) in the file path. Note: it seems that not any umlaut causes a problem. I tried a simpler directory name, with the same umlaut and that caused no problem.

RapidCRC_umlaut_bug.zip rapidcrc umlaut bug 2017-02-10_08-44-55

kjeasy avatar Feb 10 '17 07:02 kjeasy

Here is the option page, in case this is caused by an invalid mix of options :-). rapidcrc umlaut bug options page

kjeasy avatar Feb 10 '17 07:02 kjeasy

One more finding: it seems this issue is caused by the program not recognizing correctly it's own Unicode UTF-8 generated md5 files. I checked the option "General / Default to codepage when opening / UTF-8", because I have also the option "File creation / Create Unicode Files / UTF-8" activated.

I still think this is a bug, as the generated file header says clearly that this md5 file IS UTF-8. Thanks. Klaus

kjeasy avatar Feb 10 '17 08:02 kjeasy

There isn't much RCRC can do in this case. UTF-8 files usually have no byte order mark at the beginning and can not be discerned from files in your local codepage. RCRC uses windows functions to "guess" which encoding your file is in this case, and here it clearly fails and guesses wrong. This doesn't happen with utf16 files since they start with a byte order mark that can be detected.

OV2 avatar Feb 10 '17 11:02 OV2

Understood, I overlooked that aspect. Would it be possible, then, to avoid this pitfall for users like me, who are not so aware of this issue, to set the default for file creation and reading to UTF-16? Thanks. Klaus

kjeasy avatar Feb 15 '17 15:02 kjeasy