guile-gi icon indicating copy to clipboard operation
guile-gi copied to clipboard

Guix, grafts, gee…

Open LordYuuma opened this issue 5 years ago • 57 comments

Part of our design goals seems to be building against one version of GLib/GObject/GI, while allowing users to load any version. Today I am here to tell you, that this is extremely broken.

Why Guix?

Guix is just the messenger, but there is a stronger reason behind why GI appears rather broken in any Guix package, and that is grafts.

What are grafts?

Grafts are Guix' way of not rebuilding the world when an important security is rolled out. Basically, they allow you to build and link against old versions of a library while running the program against a new one. Traditional distros do that all the time and you don't even notice, but on Guix you actually have two versions of that library still lying around. The ungrafted one and the grafted one.

Why is this an issue?

Because it is possible to get those two mixed up, e.g. in guix environment. I am not sure, which use cases are affected, but that one surely is. To see the difference, run

./configure --without-gir-hacks
make clean check

once inside a guix environment with grafts, and once in one without them. If you want to use Guix environments to prototype your applications, that means you'll have to use --no-grafts to work around these types of issues for now.

What to do from now on?

It is pretty clear to me, that the main culprit here is a different version of GLib being linked to Guile-GI than the one that should be loaded through Guile-GI. To fix that, we'll probably have to overhaul our entire bootstrapping procedure starting at GTypes. And we'll likely have to preload some version of GObject before defining them. Much fun.

Workaround

If you are working on Guile-GI code inside guix environment and do not wish to be haunted by this issue and how to perhaps resolve it, for the time being add --with-gir-hacks to your invocation of ./configure. If you are experiencing similar issues in your own GI-based projects, consider patching your GIRs in a manner similar to what we do.

LordYuuma avatar Oct 25 '20 20:10 LordYuuma

I've only followed this repo on the side and I don't really understand much about it. I'm only interested in building nice GTK apps using GNU Guile at some point. That said, since you already started explaining, let me ask some uninformed, possibly dumb questions : )

  • How long would it take to "rebuild the world" for this project using --no-grafts?
  • If --no-grafts solves the problem already, why would you need to overhaul the entire bootstrapping procedure?
  • Why does one usually not notice grafting, when it is done by traditional distros?
  • Is there no way to tell Guix which version of a library to use, when it grafts?
  • If this is a bug in GNU Guix, why not wait / develop for a bugfix instead of throwing things away? Or it too unlikely that the behavior will be fixed?

ZelphirKaltstahl avatar Oct 26 '20 01:10 ZelphirKaltstahl

  • --no-grafts does not cause any rebuilds, you're stuck with the old library version.
  • I am really not certain, that --no-grafts is a fix. If development environments are the only thing this bug affects, then fine, but having packaged some GI applications for Guix I am not sure what exactly happens there.
  • Traditional distros don't graft. They merely place an updated shared library in the same location, which Guix can't do.
  • As far as I know it's difficult to predict and not something a library should actually care about.
  • It is not a Guix-specific bug, Guix is just the messenger. The issue comes from having two versions of GLib/GObject side by side – the one we link against vs. the one we load dynamically through typelibs. The only thing special about Guix is that this routinely happens with GObject-2.0 against GObject-2.0, whereas on other distros you'd probably notice it if you tried loading GObject-1.0 or GObject-3.0 (were they to exist) through Guile-GI.

LordYuuma avatar Oct 26 '20 07:10 LordYuuma

Wait, shouldn't Guix grafting also graft guile-gi (it should actually edit the shared objects in the derivation, too)? Then everything would be fine.

Does this problem actually happen when using an installed/dependent guix guile-gi package?

Right now I'm always using a manual git checkout of guile-gi, so that can't be grafted of course (because guix doesn't know that that checkout exists). Also, I recompile it very rarely, even as I update guix (read: it's stale).

I don't want to be the one responsible for a lot of unnecessary rework. Please make sure it's actually required and I've not been making the problem seem worse than it is.

daym avatar Oct 26 '20 09:10 daym

You are not the only one responsible for this. I use Guix myself as a basis for developing this library and am very weirded out by having to resort to such hacks.

There is so far no precedence for Guile-GI packages in Guix, which might also have to do with the fact, that the guile-gi recipe on Guix was rather broken for a long time (is it fixed now? I don't remember). I would assume some weird workaround would be required to get them to run as with all the PyGI and GJS packages.

LordYuuma avatar Oct 26 '20 09:10 LordYuuma

$ cat run
#!/bin/sh
exec ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/tools/uninstalled-env guix repl -L . "$@"
#exec guix environment -l ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/guix.scm --ad-hoc guile gdk-pixbuf adwaita-icon-theme shared-mime-info -- "$@" ${HOME}/src/guile-gi/guile-gi-dannym/guile-gi/tools/uninstalled-env guix repl a.scm
 dannym@dayas ~/src/guix-gui$ strace -f ./run main.scm 2>&1 |grep open  |grep glib |grep 'libglib.*\.so' |grep -v -- '-1'
[pid  5665] openat(AT_FDCWD, "/gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libglib-2.0.so.0", O_RDONLY|O_CLOEXEC) = 15
[pid  5665] openat(AT_FDCWD, "/gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0.so.0", O_RDONLY|O_CLOEXEC) = 16

In order to debug this, I'd LD_PRELOAD something that overwrites open and prevents one of them from opening. That way, hopefully the requestor will fail and then we know who it is:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t)(const char *filename, int flags);

void *dlopen(const char *filename, int flags) {
        void* result;
        dlopen_t dlopen = dlsym(RTLD_NEXT, "dlopen");
        if (filename && strstr(filename, "xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0")) {
                fprintf(stderr, "dlopen %s\n", filename);
                return NULL;
        }
        result = dlopen(filename, flags);
        return result;
}

daym avatar Oct 26 '20 10:10 daym

I'm sorry to say that, but I don't get any meaningful results (or even results at all) from adding this to LD_PRELOAD. It doesn't even appear to execute at all.

LordYuuma avatar Oct 26 '20 11:10 LordYuuma

I do. Result: dlopen is sometimes called without full path (for example: libcairo-gobject.so.2)! if that is a string literal in some executable file, that is not good--because those references won't be found by the grafter.

New version (inside ~/src/guile-gi/guile-gi-dannym/guile-gi):

Create block-open.c:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t)(const char *filename, int flags);

void *dlopen(const char *filename, int flags) {
        void* result;
        dlopen_t dlopen = dlsym(RTLD_NEXT, "dlopen");
        fprintf(stderr, "dlopen %s\n", filename);
        if (filename && strstr(filename, "/") == NULL && strstr(filename, "libcairo-gobject.so")) {
                fprintf(stderr, "dlopen blocked %s\n", filename);
                result = dlopen(filename, flags);
                if (result) {
                        fprintf(stderr, "and would have been found! Aborting!\n");
                        int* x = 0;
                        *x = 5;
                        abort();
                }
                return NULL;
        }
        result = dlopen(filename, flags);
        return result;
}

Then compile it via gcc -fPIC -shared -o block-open.so block-open.c.

Then create run scipt:

#!/bin/sh -e

LD_PRELOAD="$PWD/block-open.so" guix environment --preserve=LD_PRELOAD -l guix.scm --ad-hoc guile gdk-pixbuf adwaita-icon-theme shared-mime-info -- make tools/uninstalled-env tools/run-guile "$@"

I also edited tools/run-guile bottom to say:

exec ${top_builddir}/tools/uninstalled-env ${top_builddir}/libtool --mode=execute \
     -dlopen ${top_builddir}/libguile-gi.la \
     gdb --args "/gnu/store/0l5a4vx5w8xv6xwq7a6s7hc4r1790lvl-profile/bin/guile" "$@"

Then LD_PRELOAD=$PWD/block-open.so ./run examples/button1.scm.

Then r.

I got this:

[...]
dlopen libcairo-gobject.so.2
dlopen blocked libcairo-gobject.so.2
and would have been found! Aborting!
(gdb) bt
#0  0x00007ffff7fc724c in dlopen () from /home/dannym/src/guile-gi/guile-gi-dannym/guile-gi/block-open.so
#1  0x00007ffff3a4d7e9 in g_module_open () from /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgmodule-2.0.so.0
#2  0x00007ffff3beaac1 in g_typelib_symbol () from /gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libgirepository-1.0.so.1
#3  0x00007ffff3be4475 in g_registered_type_info_get_g_type () from /gnu/store/dp5l10lbgh66ap4idqvmkfms1qgjsj4r-profile/lib/libgirepository-1.0.so.1
#4  0x00007ffff3c1f350 in gig_type_meta_init_from_type_info (meta=meta@entry=0x4ccdf0, type_info=type_info@entry=0x4c7540) at src/gig_data_type.c:231
#5  0x00007ffff3c1f72a in gig_type_meta_init_from_arg_info (meta=0x4ccdf0, ai=0x4c6cf0) at src/gig_data_type.c:33
#6  0x00007ffff3c23375 in arg_map_apply_function_info (func_info=0x4cb370, amap=<optimized out>) at src/gig_arg_map.c:108
#7  gig_amap_new (name=name@entry=0x4cc6c0 "container:propagate-draw", function_info=function_info@entry=0x4cb370) at src/gig_arg_map.c:69
#8  0x00007ffff3c26d83 in create_gsubr (specializers=0x7fffffffcb28, formals=0x7fffffffcb20, optional_input_count=0x7fffffffcb1c, required_input_count=0x7fffffffcb18, self_type=0x7ffff353d580, name=0x4cc6c0 "container:propagate-draw", function_info=0x4cb370) at src/gig_function.c:377

daym avatar Oct 26 '20 11:10 daym

Reading the source code of gobject-introspection, they do _g_typelib_do_dlopen in order to actually dlopen (that was inlined).

Aaaand that was patched by Guix.

          /* 'gobject-introspection' doesn't store the path of shared
             libraries into '.typelib' and '.gir' files.  Shared
             libraries are searched for in the dynamic linker search
             path.  In Guix we patch 'gobject-introspection' such that
             it stores the absolute path of shared libraries in
             '.typelib' and '.gir' files.  Here, in order to minimize
             side effects, we make sure that if the library is not
             found at the indicated path location, we try with just
             the basename and the system dynamic library
             infrastructure, as per default behaviour of the
             library. */
          module = load_one_shared_library (shlibs[i]);
          if (module == NULL && g_path_is_absolute (shlibs[i]))
            {
              module = load_one_shared_library (g_basename(shlibs[i]));
            }

I suspect it gets into the if block body.

A kingdom for a g_debug in there...

daym avatar Oct 26 '20 11:10 daym

I think I have something simpler:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef void *(*dlopen_t) (const char *filename, int flags);
#ifndef DLOPEN_BREAK_KEY
#define DLOPEN_BREAK_KEY "gobject"
#endif

void *dlopen(const char *filename, int flags) {
  fprintf (stderr, "dlopen %s", filename);
  dlopen_t dlopen = dlsym (RTLD_NEXT, "dlopen");
  if (strstr (filename, DLOPEN_BREAK_KEY)) asm volatile ("int $03");
  return dlopen (filename, args);
}

That allows you to GDB into the dlopens of any particular shared library simply by defining DLOPEN_BREAK_KEY. And as expected, it gets called via g_module_open in the first load-info of test/insanity.scm.

LordYuuma avatar Oct 26 '20 12:10 LordYuuma

Good idea!

I've read through gobject-introspection source code by now and it seems that gobject-introspection upstream take it upon themselves to provide gir files for a few other libraries (like cairo--see gir/cairo-1.0.gir.in in gobject-introspection.

But if I understand the patch that Guix did to gobject-introspection's build system (not pasted above) correctly, then they only pick up libraries in the output of the package currently built. Well, currently we are building gobject-introspection, not cairo. So it won't pick up cairo. That's why it's in there with a relative path (a fallback of the guix patch).

The reason why it then doesn't fail nicely at startup of guile-gi as it should is because I have cairo in my profile (it was probably propagated by something--I can't remove it).

Other gir files provided in gobject-introspection are: DBus DBusGLib fontconfig freetype2 gio GL libxml2 Vulkan win32 xfixes xft xlib xrandr--so those will cause trouble eventually if they refer to glib or any other gobjects (some of those definitely do. If such a package is additionally propagated into a profile, you are gonna have a hell of a time finding the problem--as we did).

It's possible to manually specify --fallback-library-path= (presumably on g-ir-scanner), so it could be a workaround to make gobject-introspection depend on an union of the respective packages (see above), and then specify --fallback-library-path manually.

In any case, I think this is a Guix bug (at least additionally).

daym avatar Oct 26 '20 12:10 daym

This is somewhat off-topic, but I think the issue becomes clear if you do the following:

$ ls -l $GUIX_ENVIRONMENT/lib/libgobject-2.0.so
$ grep libgobject $GUIX_ENVIRONMENT/share/gir-1.0/GObject-2.0.gir

Perhaps Guix fails to graft typelibs at all? In that case, we might as well file an upstream bug, but the fundamental issue remains, that we can't load any GObject other than the one we're linking against.

Interestingly guix build glib and guix build glib --no-grafts also return different results from the two paths listed above. There is probably a glib-minimal for building packages, so even if we were to fix the grafting issue, the general issue still remains.

LordYuuma avatar Oct 26 '20 12:10 LordYuuma

What does

readlink $GUIX_ENVIRONMENT/share/gir-1.0/GObject-2.0.gir

say?

FWIW, for me, there seem to be references to absolute paths of libgobject in the typelibs (and the girs):

~/x/gobject-introspection-1.62.0/compile/gir$ strings  GObject-2.0.typelib  |grep libgob
/gnu/store/xkfc1275h55ynpgfr3wwmzy9707nblwc-glib-2.62.6/lib/libgobject-2.0.so.0
$ strings `guix build gobject-introspection`/lib/girepository-1.0/GObject-2.0.typelib |grep libgobject
/gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgobject-2.0.so.0

I'm 100% sure I found how the weird reference got in. The problem is making Guix find the problem automatically.

It's because:

(1) I don't update my user profile often (2) guix environment "updates" a lot (3) gobject-introspection refers to cairo by relative name (to dlopen); reason: cairo is not a dependency to gobject-introspection; and even if it was, the Guix patch to gobject-introspection as it is now wouldn't pick it up anyway. (4) cairo uses gobject too nowadays (!!!!!), and thus refers to (another) glib (5) cairo was propagated into my user profile ages ago, and is stale (6) There's a cairo reference and thus gobject-introspection loads the cairo library (with the RELATIVE name, see above). It gets the one from (5). That refers to ANOTHER glib. See above.

Next I'm trying to fix Guix's gobject-introspection to FAIL instead of embedding relative references if built inside a guix build container.

daym avatar Oct 26 '20 12:10 daym

Hmm, it appears the gir inside $GUIX_ENVIROMNENT is already grafted and grafting does actually change stuff, but it's still a different gobject shlib.

Your venture into the depths of cairo is nice and all, but keep things simple and use test/insanity.scm.

LordYuuma avatar Oct 26 '20 13:10 LordYuuma

The point is that the "depth of cairo" loaded another version of glib. So that's where another version of glib comes from.

I'm trying test/insanity.scm now--I only now saw the new file. Thanks.

daym avatar Oct 26 '20 14:10 daym

The thing is, you don't need cairo to load a different version of GLib. It already happens with GObject alone.

LordYuuma avatar Oct 26 '20 14:10 LordYuuma

I agree.

But what guix's gobject-introspection does is a wrong thing in general, and libcairo-gobject is being loaded by guile-gi right now (note: I did not directly use cairo anywhere). That is not going to end well.

Fixing it might fix this problem and other potential problems, too.

(I tried making gobject-introspection depend on cairo by now. That causes a circular reference. Sigh.)

In any case, trying insanity.scm now.

daym avatar Oct 26 '20 14:10 daym

Ohhh, libguile-gi links to libgobject at compile time, too. Well, that's gonna be a problem if that library is different to the one dlopen'ed...

daym avatar Oct 26 '20 15:10 daym

You'll get cairo through GTK, Pango, and many other graphical stuff and the recipe seems to contain gtk+ even though it is afaik not strictly necessary.

I highly doubt you'll find a fix to this in Guix. It is likelier that whatever change you make breaks something in PyGI or GJS instead, so please be cautious. A patch to dynamically link libguile-gi against GObject etc. would be very welcome on the other hand.

LordYuuma avatar Oct 26 '20 15:10 LordYuuma

I highly doubt you'll find a fix to this in Guix.

Right now I'll settle for a procedure to flag this problem in Guix--nevermind fixing it.

A patch to dynamically link libguile-gi against GObject etc. would be very welcome on the other hand.

I don't know at all how something like that would look.

gobject-introspection seems to require glib (see below). Things it loads then also depend on glib, but not necessarily on the same version. That is not going to end well.

ldd /gnu/store/64xq4j8b181s6yz7gpg4w8ny3i6r6irk-gobject-introspection-1.62.0/lib/libgirepository-1.0.so |grep glib-
libglib-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libglib-2.0.so.0 (0x00007fb5b5c8f000)
libgobject-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgobject-2.0.so.0 (0x00007fb5b5c30000)
libgmodule-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgmodule-2.0.so.0 (0x00007fb5b5c29000)
libgio-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0-glib-2.62.6/lib/libgio-2.0.so.0 (0x00007fb5b5a5b000)

I don't think that this can be fixed in guile-gi. It can be fixed in gobject-introspection (by changing its architecture), I guess.

daym avatar Oct 26 '20 15:10 daym

ldd shows dynamic links, or is this a static link? In that case, having libguile-gi linked statically against GLib is a problem. Though either way dynamic linking would just be a crutch. The real issue is that we as libguile-gi mix our GLib into the GLib that the user actually wants to load and that's wrong.

LordYuuma avatar Oct 26 '20 16:10 LordYuuma

ldd shows what would be loaded when loading this so. It can only show things that are known without actually executing user code of that so. That means it shows things that are in the header of that so. But the loader ld.so will load those using dlopen when loading that so, in the course of running an executable. Guix uses ld's rpath option in order to make sure those headers always contain full paths.

The real issue is that we as libguile-gi mix our GLib into the GLib that the user actually wants to load and that's wrong.

So does gobject-introspection, and that's just as wrong.

daym avatar Oct 26 '20 16:10 daym

In the end this is the usual recursive definition problem. I don't get why people always do that--that has to lead to problems sooner or later.

For example having a compiler written in the language of that compiler, or xslt (which is a language for transforming xml) specs itself written in XML etc.

In this case, gobject-introspection is supposed to make glib usable in target languages other than C.

In a sane world that would mean that a binding generator doesn't use glib as a non-native input. Or if it absolutely has to (it really shouldn't [1]), then at least it wouldn't expose that glib or any of its contents to target language users (because philosophically, that's just wrong--even if it happens to work sometimes), i.e. it should be a native-input.

But no, gobject-introspection has glib as a regular input. (Trying to move gobject-introspection to native-inputs, I get build system meson does not support cross-compilation--see https://issues.guix.gnu.org/44244 )

If it has to do stuff like that, you'd think at least it would have two levels: a meta-level where it mangles definitions of glib, and that just happens to use glib for the mangling internally (but not ever expose that glib or any of its objects directly to the user), and another normal level where it actually provides glib to the target language. But no :(

Personally I'd write a replacement for libgirepository that just parses the girs or typelibs on its own for guile-gi (not using glib or girepository.so to do it). Then it's much simpler--both practically and philosophically.

Because what gir files do is describe glib at a C level. The interface of the glib is gobjects, but the implementation of glib is C (Pascal got this right in 1970, only to be ignored by almost everyone). So there's no reason for a binding generator to depend on glib at all--it would have to use the gobject interface if it did.

Of course there'll be some "syntactic sugar" provided for the target language depending on the library when it is loaded--but that's pretty much it. I remember pygtk doing this right ("override" files) a long long time ago.

An easy way to catch those problems early-on is to try to cross-compile your program. If it tries the self-reference outlined above, that would cause a failure (which is what you want) because the architectures of the parts don't match.

[1] because eventually it will become part of glib, and what then?

daym avatar Oct 26 '20 16:10 daym

I'm not sure if this adds any useful information to this discussion, but, on Linux, you can pull the results of dlopening in the following fashion. Compile this with -ldl

#define _GNU_SOURCE
#include <link.h>
#include <stdlib.h>
#include <stdio.h>

static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
    printf("Name: \"%s\" (%d segments)\n", info->dlpi_name,
           info->dlpi_phnum);
    return 0;
}

int
main(int argc, char *argv[])
{
    void *handle;
    handle = dlopen("libgtk-3.so", RTLD_LAZY);

    dl_iterate_phdr(callback, handle);

    exit(EXIT_SUCCESS);
}

spk121 avatar Oct 28 '20 20:10 spk121

Personally I'd write a replacement for libgirepository that just parses the girs or typelibs on its own for guile-gi (not using glib or girepository.so to do it). Then it's much simpler--both practically and philosophically.

To guess the level of effort, I built libgirepository without linking to glib. I get 250 glib procedures that would need stubs. But guile-gi doesn't use the whole of libgirepository, so that's an overbound. But say we circumvented linking to glib, would the underlying libffi dependency cause similar problems?

My rough estimate was done this way, after removing glib from meson.build

ninja 2>&1 | awk -F '`' '{print $2}' | awk -F "'" '{print $1}' | sort | uniq | grep g_ | wc

spk121 avatar Oct 28 '20 22:10 spk121

Not as far as I know, but I'm personally not convinced that this is the right move here. Philosophically speaking, it wouldn't be much of an introspection if one was doing it from the outside, would it?

I've had a short look at G-Golf and they seem to be doing stuff similar to us in that they partly export the base GLib that they get, but I assume this is fine for them, since they use dynamic links for GObject. That being said, I still don't have any experience with using G-Golf as a library, but there seem to be projects built on it in Guix, so it appears likely to work out that way.

TL;DR: Dynamic linking would probably be at least a short-term solution. For the long term we should think about what "introspecting your twin" really means.

@daym Do you count GIBaseInfo being a registered struct type as "exposing GLib to the user"? Because that's pretty much the only thing I can think of that fits your description here.

LordYuuma avatar Oct 28 '20 23:10 LordYuuma

Philosophically speaking, it wouldn't be much of an introspection if one was doing it from the outside, would it?

It's doing it from the outside anyway because C has no introspection (and neither does glib, generally--except for small islands). They can be faking it, but that doesn't change this fundamental fact.

But I know what you mean: gobject-introspection itself wants to be a gobject.

But if gobject-introspection itself wants to be a gobject, then it should be made impossible to load another glib using gobject-introspection (making the rest of glib similar to what compiler or shell builtins would be).

@daym Do you count GIBaseInfo being a registered struct type as "exposing GLib to the user"? Because that's pretty much the only thing I can think of that fits your description here.

Yeah.

In the end I don't see how gobject-introspection can work reliably on Guix like that--nevermind guile-gi for the time being.

The easiest way to find out in detail what is what would be to remove glib from the dependencies of gobject-introspection (and their header files) entirely and stub the things below like written below (also remove #include <glib.h> and #include <glib-object.h> from the gobject-introspection public interface). Ideally, it should still build and install. Does it?

If not, add glib back to the package dependencies but remove #include <glib.h> and #include <glib-object.h> from the gobject-introspection public interface. That should definitely work (but would still be pretty bad). If not, that's definitely very bad.

Object-like glib interface types that even gitypelib-internal.h uses:

typedef struct _GMappedFile GMappedFile;
typedef struct _GList GList;
typedef struct _GITypelib GITypelib;
typedef int GQuark; // this one is even returned as a NON-pointer
typedef struct _GError GError;
typedef struct _GIBaseInfo GIBaseInfo;

And the public interface of gobject-introspection has:

GType                  g_base_info_gtype_get_type   (void) G_GNUC_CONST; // sigh...

Primitive glib interface types which are used and fine to use since they have obvious definitions and shouldn't change (and could be just be defined manually):

#define G_BEGIN_DECLS
#define G_END_DECLS
#define GI_AVAILABLE_IN_ALL

typedef char gchar;
typedef unsigned char guchar;
typedef unsigned char guint8;
typedef unsigned short guint16;
typedef unsigned int guint32;
typedef unsigned int guint;
typedef int gint; // for gboolean
typedef signed char gint8;
typedef int gint32;
typedef unsigned long gsize;
typedef gint gboolean;

I'm all for dynamically loading stuff from glib in guile-gi, but I just want to make sure first that this actually fixes the entire problem. Otherwise the problem will be back, one library over there.

(Also, the "cairo" problem won't vanish: gobject-introspection totally can load yet another glib version when traversing through to cairo, even after all those fixes. And it does traverse there. That could also happen with other libraries like gnome highlevel libs etc. I don't want to single glib and cairo out--it's just an example)

All in all, I wonder if GNOME can be made to work reliably like Guix wants it to at all. After all, gobject.so has a central type registry and thus having two gobject.so loaded (even indirectly) in the same executable is going to make things weird...

daym avatar Oct 29 '20 00:10 daym

Would requiring guile-gi to statically link to a static version of libglib help?

spk121 avatar Oct 30 '20 16:10 spk121

I think it wouldn't help.

If you did do it (and also did it in gobject-introspection), care would have to be taken not to pass any gobject-introspection-internal things (like GITypeInfo) as gobjects to the scheme user or to any of the dlopened libraries at runtime (it would always have to be wrapped and unwrapped so it's doesn't scare them :P).

But guile-gi then still shouldn't use any static glib type conversion functions to resolve dynamically-loaded objects (for example to convert to gobject interfaces)--so it wouldn't help much.

If anything it would just hide the problem and now make it impossible to debug even with a LD_PRELOAD.

I think that the goal should be to either

(a) remove the explicit glib depdendency from both gobject-introspection and guile-gi, or (b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

I think that we need help from Guix on how this is supposed to be fixed.

daym avatar Oct 30 '20 18:10 daym

Manual require and load depends on being able to export GIBaseInfos and dealing with them in Scheme code. And since typelib->module is implemented on top of them… you get the idea. Also statically linking more stuff is ethically wrong, as Guix would require us to statically link less. Not to mention, that static linking also has security implications on other distros.

But if gobject-introspection itself wants to be a gobject, then it should be made impossible to load another glib using gobject-introspection (making the rest of glib similar to what compiler or shell builtins would be).

Just because we aren't doing it right, doesn't mean that it's impossible. We just have to stop mixing our (not theirs) internal GObject into the stuff we return. If anything is missing in order to do that, it probably some macro, that hasn't been introspected.

In the end I don't see how gobject-introspection can work reliably on Guix like that--nevermind guile-gi for the time being.

You load GObject.Object et al. through require and load instead of referring to G_TYPE_OBJECT. The same applies to GIRepository itself, but we can just load that internally from C code without batting an eye.

All in all, I wonder if GNOME can be made to work reliably like Guix wants it to at all. After all, gobject.so has a central type registry and thus having two gobject.so loaded (even indirectly) in the same executable is going to make things weird...

Weird maybe, but definitely manageable. Especially in Guile, where you could use e.g. (@@ (gi Typelib-Version) <GValue>). Also GI does not allow instantiating different versions of the same typelib. So if you ever end up with glib-x.y and glib-a.b both loaded through GI (not once through GI and once through the wrapper as we have it here), then either someone maliciously renamed one of them forming "TotallyNotGLib-2.0.gir" or you've found a pretty major bug in GI.

(b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

That's a rather interesting twist on my suggestion to dynamically link them to ensure, that they are the same under most circumstances. (Dynamic linkage against 2.0 and requiring 2.0 through typelib should yield the same result unless your system is broken beyond repair, the problem would then still manifest in 2.0 vs 3.0).

LordYuuma avatar Oct 30 '20 18:10 LordYuuma

Weird maybe, but definitely manageable. Especially in Guile, where you could use e.g. (@@ (gi Typelib-Version) <GValue>). Also GI does not allow instantiating different versions of the same typelib. So if you ever end up with glib-x.y and glib-a.b both loaded through GI (not once through GI and once through the wrapper as we have it here), then either someone maliciously renamed one of them forming "TotallyNotGLib-2.0.gir" or you've found a pretty major bug in GI.

(b) make the glib dependency built-in to both gobject-introspection and guile-gi (i.e. don't dynamically-load another glib even if instructed to do so)

That's a rather interesting twist on my suggestion to dynamically link them to ensure, that they are the same under most circumstances. (Dynamic linkage against 2.0 and requiring 2.0 through typelib should yield the same result unless your system is broken beyond repair, the problem would then still manifest in 2.0 vs 3.0).

So here's another half-baked idea. ;-) If dynamic linking and requiring through typelib yields the same results, another brute force approach could be to actually replace any GLib/ GObject calls found in guile-gi itself with function pointers found (lazily) at runtime using dlopen and dlsym calls. And then allow a guile-gi client the option to pass in a path to the version of the GLib/GObject library that is desired as a run-time initialization parameter for dlopen. In essence each function call would become

GSList *(*_g_slist_append)(GSList *lst, void *data) = NULL;
GSList *g_slist_append(GSlist *lst, void *data)
{
  if (_g_slist_append == NULL)
    _g_slist_append = dlsym(libglib, "g_slist_append");
 return _g_slist_append(list, data);
}

spk121 avatar Oct 30 '20 20:10 spk121