lablgtk icon indicating copy to clipboard operation
lablgtk copied to clipboard

lablgtk fails on multicore due to use of naked pointers

Open talex5 opened this issue 4 years ago • 10 comments

Here is a simplified version of lablgtk's gpointer.ml file, showing the problem:

module Gpointer = struct
  let raw_null = snd (Obj.magic Nativeint.zero)
end

let () =
  Gc.full_major ()
$ ocaml test.ml
fish: “ocaml test.ml” terminated by signal SIGSEGV (Address boundary error)

I think this is the cause of https://github.com/ocaml-multicore/ocaml-multicore/issues/609. There is some more information about naked pointers at https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805.

talex5 avatar Jul 29 '21 09:07 talex5

This is a known problem in lablgtk, and I plan to address it. The null pointer case is just an instance, and it is relatively easy to solve. The main problem is with the generated translation tables, which are static C data, and where it is a bit difficult to add headers. Of course, I would welcome a compact patch :-)

garrigue avatar Jul 30 '21 04:07 garrigue

@kit-ty-kate raised this issue again, so I proposed we coordinate a possible effort here.

ejgallego avatar Dec 03 '21 18:12 ejgallego

OK, I need to do something about that. IIRC, most pointers are already properly wrapped, but translation tables generated by varcc do not contain the required headers. This is a bit painful to do, as the header size is different from the other contents size...

garrigue avatar Dec 04 '21 04:12 garrigue

See #144 and #145 for fixes for lablgtk3 and lablgtk2 respectively.

garrigue avatar Dec 10 '21 03:12 garrigue

We tried testing #145 (i.e. the lablgtk2 version) on multicore/5.00, but this doesn't seem to work, and we don't know why. If somebody can have a look at it this would be nice.

garrigue avatar Dec 10 '21 07:12 garrigue

I tried the lablgtk3 version on 4.12+domains. I used this code:

let () = print_endline @@ GMain.init ()

But it fails for me:

$ opam pin add lablgtk3 "git+https://github.com/garrigue/lablgtk.git#0ae631f3a0dd153c2d8e05e9ee3cc906c8503bb1"
$ ocamlfind ocamlopt -thread -package lablgtk3 -linkpkg -o test.exe test.ml
$ ./test.exe 
Fatal error: exception Failure("Obj.truncate not supported")

I also tried building it with dune, with the same result.

talex5 avatar Dec 14 '21 10:12 talex5

Can you try with the lablgtk2 version. The call to Obj.truncate is removed there. It is easier to test for us.

garrigue avatar Dec 15 '21 01:12 garrigue

I have cherry-picked the changes to the lablgtk3 version in #144 . Please test, I would like to release.

garrigue avatar Dec 17 '21 09:12 garrigue

Thanks - some of the examples now work for me. e.g. dune exec -- ./examples/entry.exe works. But others don't, e.g.

$ dune exec -- ./examples/hello.exe
fish: “dune exec -- ./examples/hello.e…” terminated by signal SIGSEGV (Address boundary error)
(rr) bt
#0  caml_darken (v=0, ignored=0x0, state=0x0) at major_gc.c:761
#1  0x000055eb1018cc60 in caml_darken (state=state@entry=0x0, v=<optimized out>, ignored=ignored@entry=0x0) at major_gc.c:759
#2  0x000055eb1018fed0 in write_barrier (new_val=94468103054592, old_val=<optimized out>, field=0, obj=obj@entry=140443383598920)
    at memory.c:140
#3  caml_initialize (fp=fp@entry=0x7fbb85fd8f48, val=val@entry=94468103054592) at memory.c:212
#4  0x000055eb10174fe9 in Val_GObject_new (p=0x55eb11b9a500) at ml_gobject.c:62
#5  0x000055eb101ae26f in <signal handler called> ()
#6  0x000055eb100c654b in camlGobject__unsafe_create_362 () at src/gobject.ml:208

talex5 avatar Dec 18 '21 14:12 talex5

Thanks for the feedback. Then I think I will merge the PR. Even if it doesn't work on multicore properly, it becomes possible to debug.

garrigue avatar Dec 21 '21 00:12 garrigue