USDX
USDX copied to clipboard
Song folders with non-ascii characters in their name are ignored without error on Windows
Actual behaviour
Songs with non-ascii characters in their folder name are not picked up by USDX on my Windows install. E.g. usdb ID 28636 "Die Schlümpfe - Kleine grüne Geister" does not show up in-game for me, and also does not leave an error in the error log. Replacing the 'ü's by 'u's in just the folder name now produces the following error in the log:
ERROR: Failed to open D:\karaoke\songs-copy\Die Schlumpfe - Kleine grune Geister\Die Schlumpfe - Kleine grune Geister.txt
ERROR: AnalyseFile failed for "D:\karaoke\songs-copy\Die Schlumpfe - Kleine grune Geister\Die Schlumpfe - Kleine grune Geister.txt".
Note how the paths show even the txt file's names with 'u's in stead of 'ü's - although I haven't renamed these!
In other cases it just displays question marks, i.e. after renaming "Żywiołak - Studzianki" to "Zywiolak - Studzianki" I get:
ERROR: Failed to open D:\karaoke\songs-checked\Zywiolak - Studzianki\?ywio?ak - Studzianki.txt
ERROR: AnalyseFile failed for "D:\karaoke\songs-checked\Zywiolak - Studzianki\?ywio?ak - Studzianki.txt".
I also tried to let the directory traversal step print all the files it finds, here is part of the results:
"The Fox" and "Teenage Dirtbag" show what it looks like when it works, i.e. displaying the folder itself, but also all its contents. For the others, it just displays the folder, i.e. it doesn't find any files inside (before renaming the folders). Also notice how the Wolfgang Petry song is apprently somehow transliterated - the actual folder name is "Wir sind die Größten", not "Wir sind die Grosten".
If I rename both the folders and the files within to remove any non-ascii characters, they show up in-game and can be played without any problems.
This doesn't seem to affect everyone on Windows, other people have reported being completely fine. I can also play the exact same files (shared folder on a dual boot) without trouble on my Linux install.
Expected behaviour
The songs can be played without any folder/file renaming, and even if they can't, an error can be found in the log.
Steps to reproduce
Install USDX, then put any song with non-ascii characters in its name into the song folder. Start the game, see that the song doesn't show up.
Details
Provide some additional information:
- USDX version: 2024.3
- Operating System + version: Windows 11
I can only add that the problem doesn't occur on Mac either, so it's really Windows only.
I can confirm this. I've tested with Let 3 - Mama ŠČ (ESC 2023 Croatia)
on 2024.5, Windows 10.
I cannot reproduce it the way @DeinAlptraum describes it.
I can reproduce it with what @RumovZ commented. Specifically (at least on my system) it's the Č
character that's causing the issue. I can also confirm that it does not cause an issue on Linux with exactly that same character.
The weird Š
, eszet, o-umlaut, various dashes, none of those cause any issues for me, it's that C that's triggering it for me.
You can just take any working txt (+accompanying files), add a Č
somewhere in the txt filename, and tada it gives an error on AnalyseFile (I assume because it can't find the file).
NB: it's possible this character doesn't trigger it on German systems, but should hopefully fail more consistent on English systems.
What I find really weird is that that S and C are in the same Unicode block, but it's totally fine with the S but not the C.
@s09bQ5 If you are on Windows, could you maybe look into this issue?
No Windows here.
I have exactly the same issue, with song "O-Zone - Dragostea din teï" and the last letter in name The problem is with file and folder name, no matter which one, if ï letter appears in any of these names, song is not displayed. Language Windows is Polish
I just git bisect
ed this down to ecca52e9b222e02542a79dea1e24ccb7c1ec6af1, this seems to be the first broken commit.
Edit: the bug disappears when re-adding the LazUTF8
unit in UPath.pas
Edit: the bug disappears when re-adding the LazUTF8 unit in UPath.pas
This is a good find! However, LazUTF8 is not available (at least not last I checked) when the msys/make method is used on Windows. And aren't special characters working properly in a ton of other places like lyrics or from the ini translations?
aren't special characters working properly in a ton of other places like lyrics or from the ini translations?
Didn't test this explicitly, but judging from how I haven't heard of any problems there before, I would say yes. I guess it's just the SysUtils filesystem functions that can't deal with this properly...
I tried looking up what exactly LazUTF8
does, but there doesn't seem to exist any proper documentation for the module at all, and the fact that it seems to do something already when just included makes this sort of intractable
Ideally we can find a solution that doesn't require Laz. I gave up on the Lazarus editor about two seconds after opening it, and I couldn't get any of the Laz stuff to work through the MSYS method anyway. It's entirely possible that in my commit I'm just calling the wrong conversion.
I am however okay with just copy-pasting the required parts of Laz somewhere in USDX. Unfortunately, I have no idea which parts that are.
Unfortunately, I have no idea which parts that are.
Build once with and once without that unit and then compare the list of undefined symbols in the object files?
Not knowing enough about LazUTF8 and filename encoding on Windows, I tried to ask ChatGPT of what's going on and how it could be resolved (using code snippets from the changes in UFilesystem.pas
):
The issue you're encountering is related to handling non-ASCII file names and directories across platforms, particularly with the change from FileExistsUTF8()
(from LazUTF8Classes
) to SysUtils.FileExists()
.
In the original version, FileExistsUTF8()
was being used, which is UTF-8 aware and correctly handles non-ASCII file names. When you switched to SysUtils.FileExists()
, it only works with the system's default encoding (which may not be UTF-8, especially on Windows), leading to issues with non-ASCII characters.
To fix this issue while keeping it cross-platform and avoiding Lazarus-specific dependencies, you need to ensure that the file paths are converted and handled using Unicode-aware functions. Here’s how you can resolve this:
-
Use Unicode-aware file operations: On Windows, you should use the
WideString
versions of the file operations (FileExistsW
,DeleteFileW
,GetFileAttributesW
, etc.) to properly handle non-ASCII paths. These functions are available in theWindows
unit. -
Ensure proper conversion to
WideString
: TheToWide()
method that you're using should return aWideString
that correctly handles UTF-16 encoded strings, which are required for Windows API functions. -
Handle cross-platform support: For non-Windows platforms, continue using the regular
SysUtils.FileExists()
, which should work with UTF-8 paths on platforms like Linux and macOS.
Here’s an updated version of your code that should address the issue:
uses
Classes,
SysUtils,
{$IFDEF MSWINDOWS}
Windows,
{$ENDIF}
UPath;
function TFileSystemImpl.FileExists(const Name: IPath): boolean;
begin
{$IFDEF MSWINDOWS}
// Use WideString version on Windows to handle non-ASCII paths
Result := Windows.FileExistsW(PWideChar(Name.ToWide()));
{$ELSE}
// On other platforms, use SysUtils.FileExists which handles UTF-8
Result := SysUtils.FileExists(Name.ToUTF8());
{$ENDIF}
end;
function TFileSystemImpl.FileGetAttr(const FileName: IPath): Cardinal;
begin
{$IFDEF MSWINDOWS}
// Use WideString version on Windows
Result := Windows.GetFileAttributesW(PWideChar(FileName.ToWide()));
{$ELSE}
// On other platforms, use SysUtils version
Result := FileGetAttrUTF8(FileName.ToUTF8());
{$ENDIF}
end;
Key Points:
-
Windows-Specific Handling: On Windows, the code now uses
FileExistsW()
andGetFileAttributesW()
to handleWideString
paths, ensuring that non-ASCII paths are correctly processed. -
Cross-Platform Compatibility: On non-Windows platforms, it falls back to using
SysUtils.FileExists()
andFileGetAttrUTF8()
, which work with UTF-8. This should resolve your issues with non-ASCII directories and files while keeping the code cross-platform and free from Lazarus-specific dependencies.
https://github.com/UltraStar-Deluxe/USDX/issues/709#issuecomment-1573285727 This worked for me
After an extensive "print a lot of things" session I eventually found that in Line 45 of UFilesystem.pas
https://github.com/UltraStar-Deluxe/USDX/blob/master/src/base/UFilesystem.pas#L45
if you replace
TSytemSearchRec = TSearchRec;
with
TSytemSearchRec = TUnicodeSearchRec;
then on Windows a lot of things suddenly start working. still doesn't print the special characters to the console (didn't check the logfile) but it's at least loading files. Thank you @DeinAlptraum for confirming this.
If nobody beats me to it, I'll figure out tomorrow if Linux also eats this (and if not we just type-guard it) and open a PR. Long-term we should probably start getting rid of some of the RawByteString
stuff, but unless some of these functions are now unused the priority is on getting a release with this fix out.