ImHex icon indicating copy to clipboard operation
ImHex copied to clipboard

[Hub] Artifact and cache size

Open iTrooz opened this issue 1 year ago • 11 comments
trafficstars

This issue is here to regroup all information about reducing the size of cache and artifacts

iTrooz avatar May 25 '24 11:05 iTrooz

LTO seems to influence the ccache cache sizes a lot the ArchLinux ccache cache is 150MB with LTO, and 24MB without See https://gist.github.com/iTrooz/740f00f0935e365534f5a76dab0e7738 to measure section sizes for ELF

iTrooz avatar May 25 '24 11:05 iTrooz

building in Release mode instead of RelWithDebInfo helps a lot with artifacts size. For example, Ubuntu 22.04 DEB went from 132MB to 16.2MB. Windows Installer went from 217MB to 24.2MB More information: https://github.com/iTrooz/ImHex/actions/runs/9231528139 and https://github.com/iTrooz/ImHex/actions/runs/9231536431

iTrooz avatar May 25 '24 11:05 iTrooz

using -gz=zlib (or fallback on -gz doesn't seem to improve cache sizes (Checked on Ubuntu 22.04 and ArchLinux builds)

Artifacts sizes do not improve either. In fact, AppImage seems to have gone from 141MB to 162MB Windows and MacOS do not support this option.

Note that actual ELF files produced drastically reduce in size (e.g. 140.6MiB to 56.4MiB for libimhex for Ubuntu 22.04). The reason we are not observing changes in artifacts is because packages formats (e.g. .deb, .rpm, .tar.zst..) are already compressed.

NOTE: This means that this optimisation would still be useful once the package installed

More information: https://github.com/iTrooz/ImHex/actions/runs/9231528139 https://github.com/iTrooz/ImHex/actions/runs/9235290159

iTrooz avatar May 25 '24 17:05 iTrooz

A complex but definitive solution to artifact size would be to store the debug info of release versions ourselves instead of bundling it in artifacts, and make ImHex upload stacktraces with code offsets to our server, where we could map them to files/lines again.

iTrooz avatar Jul 09 '24 12:07 iTrooz

Some software provides separate pdb file downloads for debugging, is this approch possible for ImHex?

Crystal-RainSlide avatar Jul 10 '24 23:07 Crystal-RainSlide

Probably, but your approach is missing some details. Who would download and use these separate debugging files ?

I offer an answer to this in my last comment

iTrooz avatar Jul 11 '24 08:07 iTrooz

Who would download and use these separate debugging files ?

AFAIK, WinDbg, "who" keep downloads symbol files automatically, until the disk is filled

Crystal-RainSlide avatar Jul 14 '24 01:07 Crystal-RainSlide

If you have a source please share it, but I'm doubtful it would do that, because its not its purpose. WinDbg is a debugger, why would it even be installed on a user machine, and why would it manage storage

iTrooz avatar Jul 16 '24 07:07 iTrooz

I think Crystal-Rain Slide means that debuggers can have symbol servers defined and when you try to debug code it downloads pdbs for libraries and things you may need. Those are microsoft servers though but you can use any server like a folder or an http address. I think the pdbs are needed by the stack tracer implementation used , so that the debuggers servers are not something that can be used here.

paxcut avatar Jul 16 '24 12:07 paxcut

Ohh, I never knew symbol servers existing. That could be a way to solve the problem indeed. But I don't plan to do it right now. It someone wants to build a PoC, please do so. I'm imagining something like a function in ImHex that calls the symbol server when crashing, or an implementation in our the API web server when they receive raw "stacktraces" without symbols from ImHex instances

Some links that seem useful: https://stackoverflow.com/a/35556262 https://docs.sentry.io/platforms/apple/data-management/debug-files/symbol-servers/ https://wiki.archlinux.org/title/Debuginfod (used by ArchLinux for downloading debug info for the libraries in pacman)

I think the pdbs are needed by the stack tracer implementation used , so that the debuggers servers are not something that can be used here.

I'm sure this can be worked around

iTrooz avatar Jul 21 '24 11:07 iTrooz

I don't know much about how the process of creating useful stack traces, but if symbol servers can be used for them then I suppose it would be the natural choice. symbol servers are not exclusive to windows, gdb also supports them and there may be linux servers that can be used as well. Im not 100% sure but i think it is likely.

paxcut avatar Jul 21 '24 11:07 paxcut