ocaml-ctypes
ocaml-ctypes copied to clipboard
Dynamic loading of a ctypes library inside an OCaml main program breaks Pervasives.compare.
We have a situation in which we produce a main executable that dynamically load a shared library using Ctypes Dl mechanism. The shared library is defined using reverse bindings, and exposes a simple interface.
In the shared library, a simple invocation of the polymorphic comparison on two identical strings returns incorrect results. E.g., Pervasives.compare "t" "t" <> 0
.
A test case is available here: https://github.com/cryptosense/ctypes_issue
. It requires OCaml 4.02.1+PIC, and at least ctypes.0.3.3.
cc @hhugo (myself)
Ok, we have investigated it a bit more and here is a possible explanation:
In the code of the dll, if we do
let _ = (7,8) in
let res = compare (5,6) (7,8) in
res
the result is1
.
If we do
(* let _ = (7,8) in *)
let res = compare (5,6) (7,8) in
res
the result is-1
.
This and other similar experiments point to the line https://github.com/ocaml/ocaml/blob/4.02/byterun/compare.c#L149 being a possible culprit. Is it really possible that the DLL is running the compare function of the main program, and thinks that the values are out of range and ought to be compared by address?
I can see two different caml_compare functions with the debugger. And they work both, if the data is on the regular ocaml heap:
let x = Bytes.create 1 in
let y = Bytes.create 1 in
Bytes.set x 0 't';
Bytes.set y 0 't';
instead of
let x = "t"
let y = "t"
I think the problem occurs during caml_main. Look what happen in caml_main in the main program and during the initalization of the shared library (caml_page_table!).
If you compile main.native with the regular compiler (not '-fpic'), the problem don't seem to happen.
It seems to be a dlopen related bug. In this case, RTLD_DEEPBIND is mandatory.
Interesting. Is this a known issue with dlopen?
also RTLD_DEEPBIND is not available on OSX. I've managed to make it work on OSX by either
- hiding ocaml symbol in the dll: passing the option '--export-symbol' to the linker
- hiding ocaml symbol in the exe: using strip
I'm wondering if using RTLD_LOCAL on OSX could make it work ? https://github.com/ocamllabs/ocaml-ctypes/issues/255
Was RTLD_LOCAL
any help on OSX?
I'll try it later today
It does NOT solve the issue.
However, I've managed to solve it using -exported_symbols_list FILE
option at link time.
Hijacking this thread back to the original subject (kind of), I have some issues with how reverse bindings might end up working in practice. From our experiments, it's quite hard to manage what symbols are exported by the shared library that we produce, and it might be the case that they mess up the third party application that load them. In the case above, we shoot ourselves in the foot because we missed the RTLD_DEEPBIND, but this is one particular instance of all the things that could go wrong.
I have tried to apply the things describe here https://gcc.gnu.org/wiki/Visibility to reduce the visibility of the OCaml runtime, to no avail. I have had mildly more success by following this http://stackoverflow.com/questions/435352/limiting-visibility-of-symbols-when-linking-shared-libraries recipe. @hhugo managed to solve the issue on OS X using the -exported_symbols_list FILE
option at link time, but AFAIK, this option is not standard.