ocaml-ctypes icon indicating copy to clipboard operation
ocaml-ctypes copied to clipboard

Dynamic loading of a ctypes library inside an OCaml main program breaks Pervasives.compare.

Open braibant opened this issue 10 years ago • 11 comments

We have a situation in which we produce a main executable that dynamically load a shared library using Ctypes Dl mechanism. The shared library is defined using reverse bindings, and exposes a simple interface.

In the shared library, a simple invocation of the polymorphic comparison on two identical strings returns incorrect results. E.g., Pervasives.compare "t" "t" <> 0.

A test case is available here: https://github.com/cryptosense/ctypes_issue. It requires OCaml 4.02.1+PIC, and at least ctypes.0.3.3.

braibant avatar Jan 14 '15 17:01 braibant

cc @hhugo (myself)

hhugo avatar Jan 14 '15 17:01 hhugo

Ok, we have investigated it a bit more and here is a possible explanation:

In the code of the dll, if we do

let _ = (7,8) in
let res = compare (5,6) (7,8) in 
res

the result is1.

If we do

(* let _ = (7,8) in *)
let res = compare (5,6) (7,8) in 
res

the result is-1.

This and other similar experiments point to the line https://github.com/ocaml/ocaml/blob/4.02/byterun/compare.c#L149 being a possible culprit. Is it really possible that the DLL is running the compare function of the main program, and thinks that the values are out of range and ought to be compared by address?

braibant avatar Jan 14 '15 17:01 braibant

I can see two different caml_compare functions with the debugger. And they work both, if the data is on the regular ocaml heap:

let x = Bytes.create 1 in
let y = Bytes.create 1 in
Bytes.set x 0 't';
Bytes.set y 0 't';

instead of

let x = "t"
let y = "t"

I think the problem occurs during caml_main. Look what happen in caml_main in the main program and during the initalization of the shared library (caml_page_table!).

If you compile main.native with the regular compiler (not '-fpic'), the problem don't seem to happen.

fdopen avatar Jan 15 '15 12:01 fdopen

It seems to be a dlopen related bug. In this case, RTLD_DEEPBIND is mandatory.

braibant avatar Jan 21 '15 08:01 braibant

Interesting. Is this a known issue with dlopen?

yallop avatar Jan 21 '15 08:01 yallop

also RTLD_DEEPBIND is not available on OSX. I've managed to make it work on OSX by either

  • hiding ocaml symbol in the dll: passing the option '--export-symbol' to the linker
  • hiding ocaml symbol in the exe: using strip

hhugo avatar Jan 21 '15 09:01 hhugo

I'm wondering if using RTLD_LOCAL on OSX could make it work ? https://github.com/ocamllabs/ocaml-ctypes/issues/255

hhugo avatar Jan 21 '15 09:01 hhugo

Was RTLD_LOCAL any help on OSX?

yallop avatar Jan 26 '15 13:01 yallop

I'll try it later today

hhugo avatar Jan 26 '15 14:01 hhugo

It does NOT solve the issue. However, I've managed to solve it using -exported_symbols_list FILE option at link time.

hhugo avatar Jan 26 '15 20:01 hhugo

Hijacking this thread back to the original subject (kind of), I have some issues with how reverse bindings might end up working in practice. From our experiments, it's quite hard to manage what symbols are exported by the shared library that we produce, and it might be the case that they mess up the third party application that load them. In the case above, we shoot ourselves in the foot because we missed the RTLD_DEEPBIND, but this is one particular instance of all the things that could go wrong.

I have tried to apply the things describe here https://gcc.gnu.org/wiki/Visibility to reduce the visibility of the OCaml runtime, to no avail. I have had mildly more success by following this http://stackoverflow.com/questions/435352/limiting-visibility-of-symbols-when-linking-shared-libraries recipe. @hhugo managed to solve the issue on OS X using the -exported_symbols_list FILE option at link time, but AFAIK, this option is not standard.

braibant avatar Jan 29 '15 15:01 braibant