hydrogen icon indicating copy to clipboard operation
hydrogen copied to clipboard

Export to mid/wav fails when file/directory tree contains diacritics or Cyrillic characters

Open msjasinski opened this issue 2 years ago • 15 comments

Hydrogen version * : 1.2.3 Operating system + version : Windows 11 64bit (and earlier versions too) Audio driver + version : PortAudio


Export to mid or wav files fails if the filename/directory structure (e.g. D:\Hydrogen\emptySong.wav) contains letters/characters such as ążźćśł (Polish) or фыва (Russian). On the other hand, these characters work fine in .h2song files.

msjasinski avatar Mar 25 '24 18:03 msjasinski

Hey @msjasinski ,

Thanks for reporting!

Exporting the song to Lilypond or the current drumkit does not work either. But only on Windows. On Linux everything works fine.

I'll have a look.

theGreatWhiteShark avatar Mar 25 '24 19:03 theGreatWhiteShark

Could you check whether you are able to export MIDI or WAV files using this version of Hydrogen?

Drumkit import and export, however, will still not work with Cyrillic scripts or the Polish additions to the Latin alphabet in either filename or parent folders. That's a limitation of the compression library we use.

theGreatWhiteShark avatar Mar 29 '24 22:03 theGreatWhiteShark

Hello @theGreatWhiteShark ! I tested it thoroughly and it works!!! Thank you very much! Keep up the good work! All the best

msjasinski avatar Apr 02 '24 20:04 msjasinski

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message hydrogen1_31gg8sugTI

msjasinski avatar Apr 03 '24 00:04 msjasinski

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message

Puh. That's a hard one. I can reproduce it but I am afraid I can not fix it.

The encoding bugs within Hydrogen I was able to fix by enforcing UTF8 encoding and relying on Qt's - the framework we use - builtin functions for file interaction. It probably uses the UTF16 versions of the Windows API and everything works fine.

But arguments passed to the application during startup seem to be more difficult to handle. There QT uses the system encoding with seemingly no way to overwrite this behavior. But since both your and mine encoding is set wrong and does not allow for Cyrillic characters, Hydrogen only receives a messed up path with all non Latin-1 characters being lost and without any way to determine the songs original location.

(Why the encoding is off after installing the language kit and being able to set keyboard to e.g. Russian and write Cyrillic letters? No idea. I'm not a Windows person. But from the perspective of Hydrogen Windows is telling us that it does not support these characters.)

theGreatWhiteShark avatar Apr 03 '24 20:04 theGreatWhiteShark

@msjasinski could you do me a favor and install this version of Hydrogen and attach the log messages?

I patched it to report the system's encoding. Just to be entirely sure we are talking about the same issue here.

theGreatWhiteShark avatar Apr 04 '24 18:04 theGreatWhiteShark

Drumkit import and export, however, will still not work with Cyrillic scripts or the Polish additions to the Latin alphabet in either filename or parent folders. That's a limitation of the compression library we use.

That shortcoming of libarchive might already have been addressed:

https://github.com/libarchive/libarchive/pull/2016

elpescado avatar Apr 05 '24 09:04 elpescado

Alternatively, maybe using archive_read_open_fd with fd obtained from _wopen on Windows instead of archive_read_open_filename would work on Windows?

elpescado avatar Apr 06 '24 11:04 elpescado

That shortcoming of libarchive might already have been addressed:

https://github.com/libarchive/libarchive/pull/2016

Hmm. I'm not sure. Within the PR they stated the patch is only affecting native Windows builds. But we ship a version obtained from the MSYS2 repos. I don't know much about our Windows toolchain or libarchive in particular but I wouldn't be surprised if the library was configured to use the POSIX interface provided by MSYS instead of the underlying Windows API.

Alternatively, maybe using archive_read_open_fd with fd obtained from _wopen on Windows instead of archive_read_open_filename would work on Windows?

I thought about this too but decided not to implement it. I'm just not familiar enough with stability and backward compatibility of the Windows API, possible friction when putting it next to MSYS2 code etc. Handling archives is such a vital part of Hydrogen that I'm a little afraid to break things for Windows users. Especially since I am not using this OS.

I read this document: https://github.com/libarchive/libarchive/wiki/Filenames#the-problem and got the impression UTF-8 support is not yet "solved" in libarchive. But I get that this is an important topic for some users and I will have another look (and come up with at least a workaround).

theGreatWhiteShark avatar Apr 07 '24 14:04 theGreatWhiteShark

A related bug - when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message

@msjasinski I added a wiki page on how to fix this issue by tweaking the Windows settings.

theGreatWhiteShark avatar Apr 07 '24 16:04 theGreatWhiteShark

Hmm. I'm not sure. Within the PR they stated the patch is only affecting native Windows builds. But we ship a version obtained from the MSYS2 repos. I don't know much about our Windows toolchain or libarchive in particular but I wouldn't be surprised if the library was configured to use the POSIX interface provided by MSYS instead of the underlying Windows API.

It's been a while since I've used Windows, but I was under impression that MSYS is a collection of POSIX shell utilities (bash, fileutils etc), but the actual compiler is MinGW, i.e. the "native" Windows build of GCC that links with msvcrt, as opposed to Cygwin a.k.a. "POSIX-on-Windows GCC". But I might be wrong, GCC on Windows is ultra-confusing.

elpescado avatar Apr 08 '24 07:04 elpescado

I took a look at the source code of libarchive and things are way more easy than I thought. They have dedicated methods for UTF-16 Windows API calls, like archive_write_open_filename_w. I wasn't aware of them previously as I used the man pages they linked on their official github page for reference. But it seems these are generated on FreeBSD and all the Windows-specific stuff was removed by #ifdefs. How inconvenient!

I'll rewrite import/export using these functions. Import with Cyrillic characters in drumkit path already works.

theGreatWhiteShark avatar Apr 08 '24 19:04 theGreatWhiteShark

@msjasinski could you do me a favor and install this version of Hydrogen and attach the log messages?

I patched it to report the system's encoding. Just to be entirely sure we are talking about the same issue here.

Here it is: log.txt

Same if directory has no diacritics but filename does have them. If case of no diacritics, the file loads OK.

msjasinski avatar Apr 08 '24 22:04 msjasinski

Here it is: log.txt

Same if directory has no diacritics but filename does have them. If case of no diacritics, the file loads OK.

👍🏿 Nice. That's exactly how things are on my local machine.

theGreatWhiteShark avatar Apr 12 '24 07:04 theGreatWhiteShark

Hey @msjasinski ,

I had another look but it seems UTF-8 support for drumkit export is something we can not guarantee for now (due to limitations of a third party library we use). But all other save/open/export/import actions should now work properly.

Could you give this version of Hydrogen one more try to double check that everything is working?

theGreatWhiteShark avatar Jun 19 '24 19:06 theGreatWhiteShark

Closed with #1981. If anything does not work yet, please feel free to reopen this issue again.

theGreatWhiteShark avatar Jul 04 '24 16:07 theGreatWhiteShark

Thanks very much! It seems alright. I'll comment if I find anything suspicious.

msjasinski avatar Jul 04 '24 20:07 msjasinski

Export midi files and Export Song work.

I couldn't test properly, because in Windows 11 file associations (with multiple versions of the same program) don't work very well out of the box. Now I tried harder and this still does not work: "This does not work: when trying to open Hydrogen files (.h2song), containing characters as described above, from File Explorer (or similar - but not from Hydrogen open menu), I still get an error message"

eg. filenames: test 21ą.h2song test 21ф.h2song

msjasinski avatar Jul 04 '24 21:07 msjasinski

Hmm. Strange. On my machine both files are working.

I couldn't test properly, because in Windows 11 file associations (with multiple versions of the same program) don't work very well out of the box.

Maybe an older version is used while opening. Could you remove all existing versions of Hydrogen, install the latest one, and try again?

You can check the version via the menu in Info > About and it should be "Hydrogen-1.2.3-224-gd4f4da526".

theGreatWhiteShark avatar Jul 05 '24 08:07 theGreatWhiteShark

No, It doesn't open .h2song files from file explorers in the newest version (224) if filename of pathname contains diacritics or cyrillics.

msjasinski avatar Jul 05 '24 18:07 msjasinski

Also, which is a minor thing, but rather important for me, the program (regardless of how it is opened; this thing is not about the case above) doesn't start maximized (even if I select so in file properties). I prefer maximized program windows, usually one per screen. hydrogen_iaUCTfGDwJ It looks like this, with the upper part of title partly hidden above the screen.

msjasinski avatar Jul 05 '24 19:07 msjasinski

No, It doesn't open .h2song files from file explorers in the newest version (224) if filename of pathname contains diacritics or cyrillics.

Could you do so again with this version which is doing some additional logging? Then in "Info > Open log file" you can view all the log messages. Could you post them in here so I can have a look?

It looks like this, with the upper part of title partly hidden above the screen.

Hmm. Is this something new or did that always happen?

theGreatWhiteShark avatar Jul 07 '24 14:07 theGreatWhiteShark

Sorry for my belated reply.

I've recently discovered (using another program) that I do indeed have some encoding issues on Windows 11. I'm relatively new to this OS, and on my old computer with Windows 7 all files open just fine. (Both computers use English as system language and I'm in Poland). I'll get to the bottom of this.

msjasinski avatar Jul 30 '24 12:07 msjasinski

It looks like this, with the upper part of title partly hidden above the screen.

Hmm. Is this something new or did that always happen?

This has always been there.

msjasinski avatar Jul 30 '24 14:07 msjasinski