rizin icon indicating copy to clipboard operation
rizin copied to clipboard

Update `librz/magic` to the latest from OpenBSD

Open XVilka opened this issue 3 years ago • 21 comments

Current code in librz/magic is from 2009. There were tons of the changes in OpenBSD since, see https://github.com/openbsd/src/tree/master/usr.bin/file

Note, that there are changes that were done in this code after the import, so they should be isolated and reapplied as appropriate. Also, probably it could be made as subproject, see https://github.com/rizinorg/rizin/issues/209

Initially it was imported in this commit: https://github.com/radareorg/radare2/commit/323441c122e0eb093c7aaafde1eed7352430466a

You can see the following changes by browsing the history of libr/magic and librz/magic correspondingly.

XVilka avatar Feb 25 '21 12:02 XVilka

It makes me wonder whether we really need rz_magic module at all...

ret2libc avatar Feb 25 '21 13:02 ret2libc

From my point of view, the magic search feature is important. But we could also consider the possibility of using libmagic as a dependency, one of our subprojects. But it depends on how much it was customized, maybe the mainstream version is incompatible out of the box.

XVilka avatar Feb 25 '21 14:02 XVilka

I agree it is an important feature indeed! I'm just not sure if it makes sense to have a module which essentially just mirrors either BSD implementation or GNU implementation, in case sys_magic is used...

I guess we need to research a bit more and see if there's any relevant change in rz_magic.

ret2libc avatar Feb 25 '21 17:02 ret2libc

I also think the magic search feature is important. Moreover, it would be better to use libmagic, though after diffing with magic.c 1.6 (I can't find magic.c 1.8 anywhere) I see there are some changes that basically only make use of Rizin types:

-private int
-info_from_stat(struct magic_set *ms, mode_t md)
-{
+static int info_from_stat(RzMagic *ms, unsigned short md) {
        /* We cannot open it, but we were able to stat it. */
-       if (md & 0222)
-               if (file_printf(ms, "writable, ") == -1)
+       if (md & 0222) {
+               if (file_printf(ms, "writable, ") == -1) {
                        return -1;
-       if (md & 0111)
-               if (file_printf(ms, "executable, ") == -1)
+               }
+       }
+       if (md & 0111) {
+               if (file_printf(ms, "executable, ") == -1) {
                        return -1;
-       if (S_ISREG(md))
-               if (file_printf(ms, "regular file, ") == -1)
+               }
+       }
+       if (S_ISREG(md)) {
+               if (file_printf(ms, "regular file, ") == -1) {
                        return -1;
-       if (file_printf(ms, "no read permission") == -1)
+               }
+       }
+       if (file_printf(ms, "no read permission") == -1) {
                return -1;

But then, there are some Rizin specific functions defined:

+RZ_API void rz_magic_free(RzMagic *ms) {
+       if (ms) {
+               free_mlist(ms->mlist);
+               free(ms->o.pbuf);
+               free(ms->o.buf);
+               free(ms->c.li);
+               free(ms);
+       }
+}
+
+RZ_API bool rz_magic_load_buffer(RzMagic *ms, const char *magicdata) {
+       if (*magicdata == '#') {
+               struct mlist *ml = file_apprentice(ms, magicdata, FILE_LOAD);
+               if (ml) {
+                       free_mlist(ms->mlist);
+                       ms->mlist = ml;
+                       return true;
+               }
+       } else {
+               eprintf("Magic buffers should start with #\n");
+       }
+       return false;
+}

caribpa avatar Feb 25 '21 17:02 caribpa

There is some content removed in the Rizin version, but I cannot tell if they are rizin specific changes or version changes (1.6 -> 1.8, as magic.c disappears from the OpenBSD history in the next version)

caribpa avatar Feb 25 '21 17:02 caribpa

It shall be noted that all the GNU, BSD, etc versions come from https://www.darwinsys.com/file/

caribpa avatar Feb 25 '21 17:02 caribpa

It shall be noted that all the GNU, BSD, etc versions come from https://www.darwinsys.com/file/

Yes, except the version we ship is from OpenBSD, that as said in the link wrote its own code to implement libmagic :(

ret2libc avatar Mar 03 '21 16:03 ret2libc

@caribpa @ret2libc if the patching cannot be avoided we could employ semantic patching with Coccinelle suite:

  • https://coccinelle.gitlabpages.inria.fr/website/sp.html
  • https://dewaka.com/blog/2020/01/25/semantic-patching/

XVilka avatar Mar 04 '21 03:03 XVilka

@caribpa @ret2libc if the patching cannot be avoided we could employ semantic patching with Coccinelle suite:

* https://coccinelle.gitlabpages.inria.fr/website/sp.html

* https://dewaka.com/blog/2020/01/25/semantic-patching/

I'm not sure what coccinnelle has to do with libmagiz :) Could you elaborate?

ret2libc avatar Mar 04 '21 08:03 ret2libc

From what I understand, @XVilka is suggesting to use the latest libmagiz (still needs to be discussed which one), and smart-patch it (with the Rizin specific changes) using Semantic patching (coccinelle).

caribpa avatar Mar 04 '21 09:03 caribpa

I think we should see first if it is possible to simply use the system's libmagic and replace the unwanted behavior.

caribpa avatar Mar 04 '21 09:03 caribpa

@ret2libc why the thumbs down? 😭

caribpa avatar Mar 04 '21 10:03 caribpa

Btw, what I was suggesting when I said simply use the system's libmagic and replace the unwanted behavior was using the weak attribute (though I realized it is not portable), or dlsym (which is POSIX but apparently not on Windows).

caribpa avatar Mar 04 '21 10:03 caribpa

Windows seems to have a pragma for weak-like, see this answer

caribpa avatar Mar 04 '21 10:03 caribpa

According to this answer, the dlsym equivalent in Windows is GetProcAddress

caribpa avatar Mar 04 '21 10:03 caribpa

Well, apparently there's already rz_lib_dl_sym for solving the portability issues 🙂

caribpa avatar Mar 04 '21 10:03 caribpa

@ret2libc why the thumbs down? sob

Just because I don't like the approach. And, again, I don't see why coccinelle would really help to do any non-trivial adjustments.

ret2libc avatar Mar 04 '21 10:03 ret2libc

@caribpa @ret2libc I extracted all changes for files in librz/magic since the initial import into the separate repository, also combined some commits for clarity, see https://github.com/XVilka/rizin-magic

Please note that it's made through git replace --graft so GitHub does't show the right history, locally it should be fine.

git replace --graft f448ee2682c48d8cc3d2592147f96984ecbc6ced a8e896b7551107dd924e5ce8dc3b2be0ac8fe7ae

XVilka avatar Mar 05 '21 07:03 XVilka

And here is the combined patch of all changes in both radare2 and rizin since the initial import:

magic_radare_rizin_changes.patch.zip

The same difference but without changes by clang-format:

magic_radare_rizin_changes_no_clang_format.patch.zip

XVilka avatar Mar 05 '21 08:03 XVilka

Excellent work @XVilka 👏

I think we should first research if we could make libmagic a dependency (like capstone), compile it, and then use rz_lib_dl_sym for getting the libmagic functions we don't want to modify (and implement our own version/wrap of the things we want to work differently) for future-proofing.

caribpa avatar Mar 05 '21 12:03 caribpa

This issue has been automatically marked as stale because it has not had recent activity. Considering a lot has probably changed since its creation, we kindly ask you to check again if the issue you reported is still relevant in the current version of rizin. If it is, update this issue with a comment, otherwise it will be automatically closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 07 '21 12:09 stale[bot]