ppsspp icon indicating copy to clipboard operation
ppsspp copied to clipboard

[Request] Support for CHD format

Open Danixu opened this issue 6 years ago • 112 comments

Hello,

I'm using from about a month this emulator, and I've discovered the CHD format. It uses lzma compression so is a space saver (much more than CSO format). Now i'm storing the games in 7z and extracting the ISO when I want to play, but CHD has both benefits (stored in lzma format and playable).

Please, add support for CHD to this core.

Thanks!!

Danixu avatar Dec 16 '17 15:12 Danixu

adding it to the standalone version would also be nice since not everyone uses RetroArch

RinMaru avatar Dec 16 '17 16:12 RinMaru

We already support CSO format. While it might compress slightly worse, it's still pretty good.

hrydgard avatar Dec 16 '17 20:12 hrydgard

All such formats will compress files in blocks, which won't compress as well as a 7z of the entire file. Have you actually compared a CHD of an ISO to a CSO?

Note that you can use maxcso to create CSOs with a larger block size, and using 7-zip's deflate algorithm, which compresses better than most CSO programs. Files compressed this way already work fine in PPSSPP.

-[Unknown]

unknownbrackets avatar Dec 17 '17 02:12 unknownbrackets

One 'advantage' of CHD is that chdman and mame are supposed to have support for a form softpatching (parent-clones relationships, where clones store different files/sectors).

Pity it doesn't work anywhere else (including retroarch, except, ofc, the MAME core)... and they nixed my proposal on their chdman issue report page to add 'reversible hardpatch' to their format (basically, do the boring work of integrating a 'patch update' tool that takes a binary patcher, patches the underlying bytes, and creates a reversal patch to add as header for the next 'patch update' and recompress that again).

Big advantage of such schemes is that they don't depend on lazy/dead emulators implementing softpatching to work (reimplementing the wheel every time), while still being easy to update. Emulators still need to implement reading chd, but that's much easier. The devs thought that since the format specifies that the format should support parent/clone relationships this wasn't needed. I'm more pessimistic, complicated features are the first to be handwaved away.

Anyway, CHD is bad for now for retroarch anyway because the metadata database and their scanner is not ready to accept CHD checksums, internal or external (even the internal checksums are different from the raw cue/bin because they're of a 'sum' of the involved files, not only the 'first track' or whatever hack RA is using - they previously used cues and gdi and of course this failed alot since many gdi are indistinguishable due to dumping group bad idea, so i'd actually prefer the CHD approach). This may change thou, since they seem proud of the CHD support.

i30817 avatar Dec 20 '17 05:12 i30817

@unknownbrackets I know the question is not for this topic, But do you have a plan to make a GUI for maxcso? What I want to do is to compress iso files, but keep integrity. I mean:

1st - Original ISO file, check CRC32 on database (for example I use renascene) 2nd - Compress ISO file with maxcso (looking for zso because as I read before is the best format in terms of compression and load times. 3rd - Uncompress ZSO file and the obtained ISO file has same CRC32 than before.

With CSOPlus you will have the best compression but that's because it will delete UPDATE partition, so ISO file will never be as the original once you decompress it.

ppmeis avatar Dec 20 '17 12:12 ppmeis

Is ZSO even ment for the biggest compression? From my test with available formats from best compression to worst: PBP -> CSO2 with bigger block size -> CSO -> ZSO

Using The Leecherman's "ISO~PBP converter" for PBP and maxcso for other formats, maybe I just don't understand the settings which I should use for ZSO, but from my testing it was the fastest while leaving the biggest file.

Anyway if few mb's is a big deal PBP format seems the best for me, it's also supported on PSP and doesn't modify the file in any way, commonly overlooked with some myths caused by Sony's using it for non-PSP games as well, but PPSSPP supports it just fine.

CHD did had a sligly better compression even than PBP, but I don't think it's worth storing PSP games in that format, it's not native to the platform and doesn't have anything which would make it worth using while coming with all MAME stuff that we don't care about.

Guess I'll change the title as PPSSPP is not related to RA it looks like posted in wrong place.

LunaMoo avatar Dec 20 '17 13:12 LunaMoo

The performance difference for ZSO only really matters on the 222Mhz CPU of the PSP. A CFW implemented it, so I tried experimenting with it, but it really didn't seem all that worthwhile for desktop.

If your device has a Mhz higher than 1000, it's probably going to make almost no difference in speed (for decompression.) And CSO compresses better.

However, LZMA has a more significant impact typically on CPU, mostly because it usually uses larger lookbehind buffers and lots more memory and memory searching. This means using CHD might have a non-negligible performance impact compared to using CSO/ZSO even for modern devices. Would need to be tested, though.

Not really planning to add a GUI to maxcso, and also not planning to make it do destructive changes.

Note that maxcso is kinda like pngcrush - it tries to compress the same data multiple times to achieve the very best compression ratio. It will usually compress slower than other tools (although can use tons of cores, so not always), but get a better ratio.

-[Unknown]

unknownbrackets avatar Dec 20 '17 15:12 unknownbrackets

there any update on this?

RinMaru avatar Jan 18 '20 01:01 RinMaru

No.

hrydgard avatar Jan 18 '20 10:01 hrydgard

Sorry to ask, but it seem that this hasn't been looked fior a while now. Do you thing its possible at all?

lamvuong2019 avatar Apr 28 '20 07:04 lamvuong2019

How much of a win is CHD over CSO anyway? CSO already successfully squishes games pretty well, as previously mentioned, and we do support that.

hrydgard avatar Apr 28 '20 07:04 hrydgard

Depends on data, but if I recall maybe alike 10mb per gb, PBP container which we also support is somewhere between max compression CSO and CHD, making this pretty irrelevant, through there's only windows software for PBP, so CSO is more widely used.

LunaMoo avatar Apr 28 '20 07:04 LunaMoo

I will run a test now on Final Fantasy type 0 and will come back to you with the result for chd, gz and cso level 9

lamvuong2019 avatar Apr 28 '20 07:04 lamvuong2019

I did a quick check on two games:

Dissidia 012 - Duodecim Final Fantasy (USA).iso  1674M
Dissidia 012 - Duodecim Final Fantasy (USA).cso  1291M
Dissidia 012 - Duodecim Final Fantasy (USA).chd  1154M
Patapon (USA).iso  326M
Patapon (USA).cso  211M
Patapon (USA).chd  161M

In theory, chd should also have faster I/O, but I don't have any easy way available to benchmark that.

Sanaki avatar Apr 28 '20 08:04 Sanaki

The biggest win with CHD support is being able to use the same tool everywhere, and native fast scan support in dat tools like clrmamepro. The disk space savings are nice too.

Tuxie avatar Apr 28 '20 08:04 Tuxie

Isn't PBP used for psx iso only?

I don't know if this is a good example or not

Final Fantasy Type-0 (English Patched v2) Original 3084866 CSO Failed CHD 2410963 - 22% GZ 2434175 - 21% https://ibb.co/TLc98LZ

Dante's Inferno (USA) (v1.01) Original 1770240 CSO 1468477 - 17% CHD 1379283 - 22% GZ 1424640 - 20% PBP 5637395 increase 118% https://ibb.co/w4y1qM9

Summary Saving https://ibb.co/8x71gGW

lamvuong2019 avatar Apr 28 '20 08:04 lamvuong2019

CHD has support for parent-child relationship, which are both a interesting way to softpatch and a interesting way to save space in multiple cd games (by making cd 1 the 'parent' of cd 2).

I definitely don't recommend doing the second ofc, since it's kind of a 'semantic' mess - and i'm not sure you can do both at the same time. Or at least you can't do it twice (parent-parent2-finalchild). It works like this: the child has the sha1sum hash of the 'parent' and when loaded you should provide both to get the 'complete overlay'.

Notice the complete lack of filename or paths here. It's to the application to use a convention to find the 'sets'. Either dats to find the filename, path conventions or a scanning step to collect the sha1sum of all chds on a 'game directory' are all options (i prefer the last option because it doesn't depend on users).

Supporting this is currently not supported in libchd (the non mame library handling chd for other projects).

i30817 avatar Apr 28 '20 08:04 i30817

The method 2 you describe is a form of deduplication, I think it will be a total nightmare and a massive performance impact if you don't have the necessary cache or cpu to data crunch.

I think the compatibility and easy maintenance of the chd is definitely a worthy method of backup.

lamvuong2019 avatar Apr 28 '20 09:04 lamvuong2019

As the person who opened that issue, I agree clone support is wonderful, but it's also not terribly important for PSP. It would be quite nice for minor patching, sure, but it's really not as useful as for systems like the PS1 with tons of multidisc games.

Also, PBP is only used optimally for PSP, either for uncompressed PSN titles or for PSX-PSP (which is only of value on real hardware). Most of us ripped our PSN versions to ISO for ease of use though.

EDIT: Clones don't have any performance impact. CHD splits the image into tens of thousands of "compressed hunks of data" (hence the name), each of which is compressed individually based on which compression method is most efficient. Duplicated hunks are already referencing the same data, clones just reference almost all of the data from the parent, barring the few hunks that have changed.

As an example, this is the hunk type breakdown from a clone I made of a translation for a PCE-CD game:

     Hunks  Percent  Name
----------  -------  ------------------------------------
       424     1.5%  Copy from self                          
    18,319    66.4%  Copy from parent                        
     1,917     6.9%  CD LZMA                                 
       184     0.7%  CD Deflate                              
     6,743    24.4%  CD FLAC

To be clear, clones are a fantastic feature and were it supported I'd definitely use it, but I feel like worrying about them before they're supported by libchdr would be a bit premature. For now, let's just see about getting the basic format handled.

Sanaki avatar Apr 28 '20 09:04 Sanaki

The method 2 you describe is a form of deduplication, I think it will be a total nightmare and a massive performance impact if you don't have the necessary cache or cpu to data crunch.

I would expect that chd is a better format than most for this and doesn't use silly half baked features like putting the original file in memory to 'patch it' which is the achilles flaw of every 'softpatching format expanded to cds' that causes all of them to be massive failures with cds - i'm looking at you BPS.

Libchdr may still screw up of course, but the format was done for this kind of 'deduplication' usecase.

And it's not like the native filesystem dedup works for isos or cue/bin anyway, the FS dedup is oriented to file - it does absolutely nothing to two 99% similar files, and compression dedup often screws up and turns that 99% similar into 20% similar if there are some insignificant but regular offsets discrepancies between two files. In fact, often a file compressor is so worried about compressing inside the file that it loses the opportunity to recognize that two large files are very similar since it forgets about the first file because of the compression window.

While chd knows those two files are related and uses filesystem block matching. Of course, even CHD may not help if those two cds are themselves using filesystem compression or cryptography inside their filesystem. Modern games are lame like that, use it case by case.

i30817 avatar Apr 28 '20 09:04 i30817

OK, so a win, but not an enormous one. Some people might find it worthwhile and I do understand that it's nice with a common tool.

So if anyone is interested in implementing this, what you need to do is:

  • go into Core/FileSystems/BlockDevices.cpp/h, add a new block device class for CHD
  • hook it up in constructBlockDevice
  • fill it out with calls into libchdr
  • set up cmake and visual studio projects to build and include libchd
  • (minor tweaks like showing .chd in the file open dialogs, and stuff like that)

I probably won't get to this in the near future myself, busy with finishing up stuff for 1.10 in the time I have.

hrydgard avatar Apr 28 '20 09:04 hrydgard

I don't have the time or expertise to do it myself, but I did toss a $15 bounty on it in case anyone else does. Though bountysource seems to be having some issues right now, so it may take a bit to show up correctly. https://www.bountysource.com/issues/52791233-request-support-for-chd-format

Sanaki avatar Apr 28 '20 09:04 Sanaki

I am also willing to support the bounty, its also failing for me too. I tried to top it up with 15 also... getting a massive red errror something went wrong, I will try again later to see if it works.

It had an error but it went through anway, kinda strange anyway my pledge went through and the strangest thing is it is still staying at 0 dolar in the bounty... Hopefully someone pick this up as it is a big bonus for everyone.

Here is my proof for the pledge and no expiration. https://ibb.co/qkKKqD5

lamvuong2019 avatar Apr 28 '20 09:04 lamvuong2019

If it hasn't shown up by tomorrow I'll contact them about it. Given that the transactions are being recorded, it should catch up eventually. PPSSPP CHD bounty

Sanaki avatar Apr 28 '20 10:04 Sanaki

Isn't PBP used for psx iso only?

I don't know if this is a good example or not

Final Fantasy Type-0 (English Patched v2) Original 3084866 CSO Failed CHD 2410963 - 22% GZ 2434175 - 21% https://ibb.co/TLc98LZ

Dante's Inferno (USA) (v1.01) Original 1770240 CSO 1468477 - 17% CHD 1379283 - 22% GZ 1424640 - 20% PBP 5637395 increase 118% https://ibb.co/w4y1qM9

Summary Saving https://ibb.co/8x71gGW

No, PBP is Sony's native container, it's used for all PSP games available from PS Store, it also can be used to compress iso's althrough there's only one software for that and it's win-only ~ https://sites.google.com/site/theleecherman/IsoPbpConverter

You probably used PSOne software to end up with larger file, it has no sense as PBP has better compression than CSO with large chunks and as far as I recall testing size was comparable to chd.

Adding to your results of FFT0, converted to megabytes for readability:

original(megabytes)                                   3 012
CSO1 lvl 9                                            2 497
cso2 with 4096 chunk                                  2 451
PBP                                                   2 404
cso2 with --block=32768 --format=cso2 --use-zopfli*   2 388
CHD                                                   2 354

So you save 50 mb on 3000 mb file(PBP - highest compression format supported currently in PPSSPP vs the unsupported CHD), that's roughtly 16 mb per 1000 mb's. Also the tool you use for CSO sucks, most modern CSO tools can support filesize above 2gb, I'd recommend [Unknown's] Max CSO, it also allows compression with larger chunks which is producing smaller file size than just using lvl 9 compression CSO.

Edit: included cso2 with [Unknown's] settings. So that's an 11,33mb difference per 1004mb between what we already have and CHD.

  • using --only-zopfli instead of --use-zopfli generated a file which was larger by ~ 150kb, but it ended much faster as it skipped other methods, so that might be worth considering.

In other words CHD compresses by 1.13% better vs currently supported format with highest compression.

Not that impressive as comparing it to standard CSO.

LunaMoo avatar Apr 28 '20 12:04 LunaMoo

If you want to maximize CSO compression I recommend 16384 block size, you could go with 32768 + zopfli if you want to save every byte. Might get it around PBP size with that. Warning: zopfli might easily take over an hour.

I wonder if PBP would improve via "crushing" (not sure what parameters lzrc has to tune or if there's any annoying patents in the way there...)

-[Unknown]

unknownbrackets avatar Apr 29 '20 03:04 unknownbrackets

Don't know what happen the bounty is still at 0, and bountysource.com is really slow to load up. I will contact support to see if they can fix it.

lamvuong2019 avatar Apr 29 '20 21:04 lamvuong2019

@unknownbrackets

Warning: zopfli might easily take over an hour.

Over an hour? I'm really surprised: I've just compressed Monster Hunter Freedom Unite's ISO (845.6 MB) with the following command: maxcso --block=32768 --use-zopfli input.iso -o output.cso and this took a bit less than 6 minutes on my laptop with an i7-6700HQ to produce a 763.4 MB CSO.

vnctdj avatar Apr 29 '20 22:04 vnctdj

Right, but you've got 8 threads, and iirc FF Type-0 is closer to 3 GB. It could take a lot longer for someone on a weaker CPU (maxcso speed is basically linear to size of ISO and number of cores, which should hopefully be true up to at least 32 cores.) Zopfli specifically can vary a lot in speed depending on data as well, iirc.

Does any tool exist to try different LZMA settings at a block level to produce the smallest possible CHD (like maxcso)? Are the provided numbers already this, or might there be lower, more compelling numbers?

-[Unknown]

unknownbrackets avatar Apr 29 '20 23:04 unknownbrackets

I only know chdman for conversions to CHD, and the only thing you can change is the hunk size, but making it bigger has worst compression ratio. I'm working on convert all my library to CSO using the options --block=32768 --use-zopfli to compare with the best compresson. For now the result is about a 10% of reduction on CHD compared with CSO with just 7zip option (from 67.3Gb to 61.2Gb).

Danixu avatar Apr 30 '20 00:04 Danixu