libchdr icon indicating copy to clipboard operation
libchdr copied to clipboard

`chd_open` and `chd_read_header` don't work with unicode paths on Windows

Open TellowKrinkle opened this issue 3 years ago • 18 comments

Paths that aren't representable with the user's current codepage are unable to be accessed. There needs to be a way to open a chd that eventually calls _wfopen instead of fopen.

Possible solutions:

  • New versions of chd_open and chd_read_header are added that take a FILE*, allowing users to open their own files with the appropriate call. chd_open_file is almost this for chd_open, but it's impossible to set (*chd)->owns_file from outside of the library, complicating closing of files.
  • New versions of chd_open and chd_read_header are added that take a wchar_t* and call _wfopen on windows
  • chd_open and chd_read_header always treat input as utf-8, and internally convert to utf-16 before calling _wfopen on Windows

TellowKrinkle avatar May 14 '21 22:05 TellowKrinkle

Or you could go full custom stream reading by defining function prototypes for read/seek and let the user provide implementations for these while carrying a user-data pointer around which is passed through to the previous functions. This way the user can open the file any way they wish (even from memory) and only provide the read/seek methods.

But yeah some solution to this would be nice as long as I can use this from non C/C++ as well. The last option would be transparent and only requires UTF8 strings instead of ANSI with full compatibility, I like that.

whocares0101 avatar Dec 05 '21 07:12 whocares0101

Option 3. is risky, since it could theoretically break backwards compatibility with existing codebases if one uses a local ANSI codepage consciously with libchdr. Basing on what I've seen around, the best approach would be to either have explicit _utf8 functions that do what is suggested, or an overload passing FILE*. One example of such approach is https://github.com/leethomason/tinyxml2.

CookiePLMonster avatar Dec 11 '21 12:12 CookiePLMonster

When will this be fixed?

Zero3K avatar Mar 31 '22 05:03 Zero3K

Or you could go full custom stream reading by defining function prototypes for read/seek and let the user provide implementations for these while carrying a user-data pointer around which is passed through to the previous functions. This way the user can open the file any way they wish (even from memory) and only provide the read/seek methods.

I coincidentally happen to be working on this in PR #78 so this should be possible if my PR is accepted.

My implementation requires emulators to provide definitions for four functions: read,seek,close, and size (size returns the size of the file in bytes, tell is currently only used for this purpose so it seemed better to have a size function than a tell function). There's a new function called chd_open_core_file which takes in pointers to these four functions in addition to a user data pointer like you described.

snickerbockers avatar May 25 '22 04:05 snickerbockers

Question - some code I'm looking at uses chd_read_header to scan for parent CHDs in the same directory.

Is the correct way to do that with core_files to simply use chd_open_core_file / chd_get_header? will that be as fast?

hrydgard avatar Aug 09 '23 18:08 hrydgard

@hrydgard parent/child has not been tested much on my side, but chd_open_core_file + chd_get_header will work.

However, no, it will not be as fast as chd_open_core_file - if valid parent is passed - will allocate and uncompress the whole CHD hunk map, and prepare codecs for decompression.

Do you need some chd_read_header function fore core_files ? It can be added if you need it.

rtissera avatar Aug 09 '23 20:08 rtissera

Don't have an immediate need, I'm doing a quick investigation to see if anything is missing in order to add CHD support to PPSSPP in the future, so figured I'd report anything missing in advance of that.

Due to restrictions on Android (scoped storage), we can't have libchdr use fopen, but we can get a FILE * handle to pass in, so really, chd_open_core_file is not quite necessary (but, will also work) - the older chd_open_file would work.

So, because of the above, chd_read_header_file that takes a FILE * rather than a filename would be fine, and a chd_read_header_core_file would also be okay.

That said, I have no idea if people are even interested in using the parent/child mechanism for PSP games or if I can just omit it entirely. There was a prototype implementation of CHD support submitted as a PR (before scoped storage), and it simply scanned all the files in the directory, reading their headers to see if the parent hash matched - which seems quite inefficient anyway if you have a lot of files in the same directory. I'm not sure if there's a canonical way to find the filename to a parent CHD.

hrydgard avatar Aug 09 '23 20:08 hrydgard

That said, I have no idea if people are even interested in using the parent/child mechanism for PSP games or if I can just omit it entirely. There was a prototype implementation of CHD support submitted as a PR (before scoped storage), and it simply scanned all the files in the directory, reading their headers to see if the parent hash matched - which seems quite inefficient anyway if you have a lot of files in the same directory. I'm not sure if there's a canonical way to find the filename to a parent CHD.

Thus far, no emulator has implemented support for delta chds. PSP is definitely a good candidate for them, since the most useful usecase is romhacks and translations, which become more of a hassle to manage the larger the input files become. I would wager people would be interested in using them, but you'd be justified in skipping support if it's too complex. The lack of established parent search method is much of why it has gone unimplemented so far, though given that it's a simple header scan and only necessary when a delta chd is loaded, I wouldn't expect a full directory search to be problematic for any but the most extreme collectors.

Sanaki avatar Aug 09 '23 21:08 Sanaki

If I do this, I think I'll initially skip the parent/child mechanism, and then potentially use a text-file based method or something where you list the parent ISO.

As for scanning the whole directory, well, then you don't know how crazy slow it is to open files from user-accessible storage since Android introduced scoped storage. 10 file opens can take a second on a slower device.

hrydgard avatar Aug 09 '23 21:08 hrydgard

That's... painful. I didn't realize they botched the implementation that badly, though I have heard complaints here and there.

Sanaki avatar Aug 09 '23 21:08 Sanaki

Anyway, it seems that implementing parent/child will actually not be that hard, and is done by pcsx2 already. So might be good to just add in a chd_read_header_core_file , for maximum flexibility and symmetry.

hrydgard avatar Aug 10 '23 08:08 hrydgard

@hrydgard I will have a look. I know CHD support in PPSSPP has been discussed a lot, so it's great to see it might be added. A few CHD-related stuff you should be aware for PSP :

  • Most people compress (wrongly) their games as CD, because of various tutorials on the Web. So you need to take that into account when implementing CHD for PPSSPP, even if it sounds bogus.
  • For PSP UMDs, the right move would be either to compress as RAW (as UMDs are mostly MODE1/2048 ISO if I am not mistaken), or to add in libchdr DVD support that MAME added back for 0.255 release (so it's very recent). The DVD is just a metadata tag, and enforcing single-sector-as-a-block compression (hunk_size = 2048) without CD sub-stream.

TL;DR -> if you write some docs regarding PPSSPP upcoming CHD support, make sure people use createraw or createdvd in order to have the best possible from the current format, but you will still need to handle in your code dodgy cd-like encodings.

I will try to add DVD support and the requested function you mentioned quickly.

rtissera avatar Aug 10 '23 08:08 rtissera

I'm not sure what it means to compress as CD vs DVD or RAW. CD Audio is not a factor on the PSP anyway, it's just plain ISO images.

What would I have to do in my code to handle CD encodings?

hrydgard avatar Aug 10 '23 08:08 hrydgard

RAW is intended for hard disk basically, DVD is plain ISO so is the closest to UMD. CD has a hunk_size multiple of 2352 to handle various CD-related stuff like CDDA, MODE2, subchannel. So your code when seeking, doing sector calculations and so on needs to accomodate CHD hunk_sizes of multiple of 2048 and 2352, basically. PCSX2 already handles all that very well, just borrow the logic from there.

rtissera avatar Aug 10 '23 09:08 rtissera

Oh, I see. DVD seems fairly appropriate then, and with a matching sector size should be the best. Do RAW images have a sector size?

hrydgard avatar Aug 10 '23 09:08 hrydgard

Yes, RAW image is the most flexible most where you can specify your own sector size and hunk size (hunk size being a multiple of sector size of course).

rtissera avatar Aug 10 '23 09:08 rtissera

And I assume the "hunk" size is the size of the pieces that will be compressed individually?

hrydgard avatar Aug 10 '23 09:08 hrydgard

Right assumption ;)

rtissera avatar Aug 10 '23 09:08 rtissera