stella icon indicating copy to clipboard operation
stella copied to clipboard

Stella doesn't load ROM with special characters in name

Open thrust26 opened this issue 3 years ago • 21 comments

Stella seems to have problems with certain special chars (non-UTF8) in ROM names (File not found/readable).

E.g. the 'ż' in the attached file is not liked. Gruniożerca.zip

thrust26 avatar Jan 04 '21 11:01 thrust26

Works for me in Linux at least. The filename is not fully displayed in the launcher, since the character set doesn't contain that character. But Stella is still able to load it. So if this a Windows-only thing, I suspect it's somewhere in src/windows/FSNodeWindows.

sa666666 avatar Jan 04 '21 14:01 sa666666

Thanks, I was already wondering why James hadn't noticed that.

thrust26 avatar Jan 04 '21 14:01 thrust26

I will try to test on some other systems later today.

sa666666 avatar Jan 04 '21 15:01 sa666666

In the project settings "Multi-Byte Character Set" (MBCS) is enabled, not Unicode (which doesn't compile).

Could that be the problem?

thrust26 avatar Jan 04 '21 15:01 thrust26

It's possible, but I've never really played with it. What happens when you force the Unicode #ifdef to be compiled? Is this the part that won't compile?

sa666666 avatar Jan 04 '21 15:01 sa666666

No, lots of stuff in FilesystemNodeWINDOWS is breaking, especially string.c_str() isn't accepted anymore (LPCWSTR is expected). I will try to fix that...

SerialPortWINDOWS has the same problems.

thrust26 avatar Jan 04 '21 15:01 thrust26

Yeah, all that code is very ASCII-centric. I don't know that it's ever been tested with other character sets.

sa666666 avatar Jan 04 '21 15:01 sa666666

Incidentally, this is why I'd hoped to move to the std::filesystem stuff in C++17, to eliminate a big chunk of this platform-specific code. But it seems not all compilers fully support it yet.

sa666666 avatar Jan 04 '21 15:01 sa666666

Grrr, got the code compiled with Unicode, but it still doesn't load the ROM. The 'ż' is converted to a simple 'z'.

thrust26 avatar Jan 04 '21 16:01 thrust26

I think we need to drill down to see exactly what is failing here. Is it on our side, or is it happening somewhere in one of the functions that Windows is using?

Sorry I can't help more with this; currently recording lectures for next semester. Maybe we have to bump this one to post 6.5.

sa666666 avatar Jan 04 '21 17:01 sa666666

No hurry, this can wait. I think Windows does it correct, but Stella partially converts special chars. Either we keep the special chars completely or we ignore them completely. Currently it seems like a mix of it.

thrust26 avatar Jan 04 '21 17:01 thrust26

I wonder what Linux is doing. Since we are using strings, it cannot store the special chars. Which means it must match 'ż' with 'z' somehow.

thrust26 avatar Jan 04 '21 18:01 thrust26

Meanwhile I found a workaround. In addFile() we can use the 8.3 filename for _path and the long filename for _displayName.

@sa666666 What do you think?

thrust26 avatar Jan 04 '21 19:01 thrust26

@sa666666 This fix would allow loading files with specials chars.

BUT when saving a file, the _path name is used if no entry in our DB exists for the ROM. This causes ugly names like 'Grunio~01.pro'. We would

  • either have to override getNameWithExt for Windows and use the _displayName here. But that feels a bit like a hack to me. And maybe there are more side effects.
  • or make sure the code uses getName() (like e.g. the PNGLibrary already does).

Actually I don't understand why the code sometimes uses the _path and sometimes the _displayName variables. Sometimes _displayName is even overwritten by the name derived from _path.

thrust26 avatar Jan 05 '21 09:01 thrust26

I have some ideas, but no time right now. I will come back to this for the next release.

sa666666 avatar Jan 06 '21 02:01 sa666666

I plan to work on this one next. Ideally, there's a way to turn all special characters into their ASCII equivalent. That's the best we can do, since Stella is essentially ASCII-only; the fonts, font renderer, etc all depend on this.

sa666666 avatar May 29 '22 18:05 sa666666

Might become complicated. How about replacing them all e.g. with '_'?

But I suppose both ideas do not help when it comes to file access.

thrust26 avatar May 29 '22 18:05 thrust26

I'm not looking to hand-code a parser for this. I'm hoping to find something in C++11 or above locale stuff to handle it. Basically, any umlauts, graves, etc would be removed and converted to the 'base' ASCII character. I don't know if this is even possible. Barring that, converting to '_' might work too.

sa666666 avatar May 29 '22 18:05 sa666666

Remember that it is not so much of a display problem, but a file access problem.

thrust26 avatar May 29 '22 18:05 thrust26

Right. For me in Linux, it is strictly a display issue. The file load part is fine. So this could get complicated.

sa666666 avatar May 29 '22 18:05 sa666666

Using 8.3 filenames might help, but a user can disable it. And probably Microsoft will eventually do so too.

thrust26 avatar May 29 '22 18:05 thrust26