ocaml-ctypes icon indicating copy to clipboard operation
ocaml-ctypes copied to clipboard

[wip] Port to Dune

Open avsm opened this issue 6 years ago • 40 comments

This is a followon from #574 with the commit history cleaned up for easier intermediate examination of the approach. The current tree builds the libraries and examples with a single dune build, including all the libffi and other configuration logic. This is still a work-in-progress PR and will be rebased a few more times.

Motivation

With a port to dune, the ctypes library can be embedded in larger dune projects simply by including it in the directory tree of the bigger project. This in turn will allow MirageOS unikernels to use Ctypes seamlessly as part of embedded compilation, using the standard cross-compilation and library variants support built into Dune. The goal is to make Ctypes the default FFI interface to MirageOS, and have it work out of the box on all of the backends. All existing OS platforms should be supported as well before merge, including Windows.

Porting Approach

This PR still installs ocamlfind libraries in the same way as the previous Makefile infrastructure, so backwards compat is preserved. However, the findlib schema has been slightly modified to separate out the foreign library from the core ctypes library. We now install:

  • ctypes that contains ctypes.top and ctypes.stubs as before. There are aliases to ctypes.foreign that redirect existing uses of those.
  • ctypes-foreign contains the base, unthreaded and threaded libraries.

All of the configuration logic for libffi is now in src/ctypes-foreign-base, so deleting these directories will remove the ffi build logic without touching the core library.

Since dune by default has stricter warnings enabled (the default --profile=dev mode), the PR currently sprinkles files with [@@@warning tags. The warnings can be fixed as well if desired, but that would muddy this PR so not done yet.

Build Time

Some crude benchmarks on the clean build time show a modest improvement with dune:

$ cd ctypes.orig && time (make -j4 all; make -j4 examples)
real	0m16.243s
user	0m18.603s
sys	0m11.589s
$ cd ../ctypes.new && time dune build @all 
real	0m12.603s
user	0m19.178s
sys	0m11.091s

(The dune build isn't fully optimised yet -- in particular, the C symbol probing can be sped up with a single invocation to cc instead of 20).

TODO

  • [x] port core ctypes and cstubs and foreign libraries
  • [x] port configuration logic for libffi to dune configurator (needs https://github.com/ocaml/dune/pull/1726 but otherwise done)
  • [x] add dependency on dune dev/1.7 in CI, as soon as https://github.com/ocaml/dune/pull/1726 is merged
  • [x] add ppx_bisect support to dune files (partially added, needs Makefile support now)
  • [x] determine the proper way to handle threads for foreign, see https://github.com/ocaml/dune/issues/1724 (fixed by #651)
  • [x] is the as-needed test for cc still needed? It is, so re-add it (and tests with ubuntu 12.04 might trigger it)
  • [x] port tests/ to dune (partially done, and not blocked on https://github.com/ocaml/dune/issues/1730 due to workaround via tests/config)
  • [x] port CI to use the dune infra (am adding support to opam2 avsm/ppa for xenial so that it works on travis for this)
  • [x] delete Makefiles (before merge, will make the patch so that both Makefiles and dune build work simultaneously, so we can check for equivalence more easily)
  • [x] check that dune build @doc is reasonable and that the Foreign variants are generated correctly despite the module clash (see https://github.com/ocaml/dune/issues/1645)
  • [x] get configure rules for windows/android over in new configure script
  • [x] add library variants for the cstubs for xen/freestanding (no need any more due to mirage/mirage#1153)
  • [x] try a ctypes build in a mirageos solo5 unikernel to confirm that the approach works
  • [x] test revdeps in opam
  • [x] test failures in Lwt tests
  • [x] test failure in arm32 (memory corruption?). Need to see if its specific to this PR.
  • [x] multiply defined symbols in fedora 32; see why this happens in this PR only and not the mainline branch.
  • [x] restore 4.02.3-4.05 support somehow, as dune-configurator depends on dune-private-libs which is >4.07. This is only a build time dependency...

avsm avatar Dec 31 '18 18:12 avsm

25ec72aa312fb759f6e1ad19d979c31b5d3dc827 switches over to using virtual_modules for ctypes-foreign, which is inline with the current Makefiles. It still doesnt do automatic selection of threaded/unthreaded, which is covered in ocaml/dune#1724

avsm avatar Jan 01 '19 17:01 avsm

is the as-needed test for cc still needed?

Hi! I just want to notice that trouble with default ubuntu ld behavior described in https://github.com/ocamllabs/ocaml-ctypes/issues/49 still reproduces for Ubuntu 18.04, while MacOS Mojave still uses clang and does not support flag --no-as-needed, so dune config:

(library
    (name my_c_bindings)
  (libraries ctypes ctypes.foreign)
(c_library_flags (-Wl,--no-as-needed -l<some_c_lib>)))

works for ubuntu and fails for mac (removing -Wl,--no-as-needed gives inverse behavior)

NightBlues avatar Apr 24 '19 09:04 NightBlues

I've worked around the ocaml/dune#1730 limitation via a configure pass in 78cad0bcac29843c00384295b3e3b9acdabad219, which is working well for test-enums (d72e533) for example

avsm avatar May 18 '19 11:05 avsm

All tests now ported; time to hook in CI

avsm avatar May 22 '19 22:05 avsm

I'm working on rebasing this to a set of clean commits. @talex5 is helping with deploying a CI to run through the foreign architectures without needing qemu. We'll try that on this branch as a test.

avsm avatar Jul 17 '19 10:07 avsm

The latest patchset on this branch improves by:

  • added github actions, to test on macos and windows. Windows now builds on mingw64
  • packaging fixes for ctypes vs ctypes-foreign, particularly for tests
  • port to dune 2.0 language, which allows for the old ctypes.foreign ocamlfind name to be deprecated by dune itself, rather than META file hacks

avsm avatar Mar 18 '20 10:03 avsm

It looks like this is using library_variants, which is currently under consideration for removal from dune. Is there a suitable alternative if the removal goes ahead?

yallop avatar Apr 23 '20 13:04 yallop

We could simply drop the unthreaded variant. We could do this in stages: merge the existing port, and subsequently remove it (and the use of variants). We could also propose specific support in dune for supporting unthreaded libraries, since they’re a bit special in terms of compiler options.

avsm avatar Apr 23 '20 15:04 avsm

@avsm:

We could simply drop the unthreaded variant.

See #651.

yallop avatar Jul 11 '20 10:07 yallop

Merged against #651 and activated ocaml-ci on my fork. I'm seeing some failures on arm32 I need to chase down:

https://ci.ocamllabs.io/github/avsm/ocaml-ctypes/commit/96700a9886dfea4551192b2cc75de9d1e845b859/variant/debian-10-ocaml-4.10_arm32

#18 7.805 test_returning_errno alias tests/test-returning-errno/runtest (exit 1)
#18 7.805 (cd _build/default/tests/test-returning-errno && ./test_returning_errno.exe)
#18 7.805 munmap_chunk(): invalid pointer
#18 7.805 E
#18 7.805 ==============================================================================
#18 7.805 Error: Errno tests:0:calling stat.
#18 7.805 
#18 7.805 File "/src/_build/default/tests/test-returning-errno/oUnit-Errno tests-buildkitsandbox#00.log", line 49, characters 1-1:
#18 7.805 Error: Errno tests:0:calling stat (in the log).
#18 7.805 
#18 7.805 Worker stops running: Killed by signal -1
#18 7.805 ------------------------------------------------------------------------------
#18 7.805 Ran: 1 tests in: 1.00 seconds.
#18 7.805 FAILED: Cases: 1 Tried: 1 Errors: 1 Failures: 0 Skip:  0 Todo: 0 Timeouts: 0.
#18 7.857 test_threads alias tests/test-threads/runtest
#18 7.857 .....
#18 7.857 Ran: 5 tests in: 1.05 seconds.
#18 7.857 OK
#18 8.093 test_returning_errno alias tests/test-returning-errno-lwt-preemptive/runtest (exit 1)
#18 8.093 (cd _build/default/tests/test-returning-errno-lwt-preemptive && ./test_returning_errno.exe)
#18 8.093 ...E
#18 8.093 ==============================================================================
#18 8.093 Error: Errno tests:0:calling stat.
#18 8.093 
#18 8.093 File "/src/_build/default/tests/test-returning-errno-lwt-preemptive/oUnit-Errno tests-buildkitsandbox#00.log", line 75, characters 1-1:
#18 8.093 Error: Errno tests:0:calling stat (in the log).
#18 8.093 
#18 8.093 Worker stops running: Killed by signal -10
#18 8.093 ------------------------------------------------------------------------------
#18 8.093 Ran: 4 tests in: 1.20 seconds.
#18 8.093 FAILED: Cases: 4 Tried: 4 Errors: 1 Failures: 0 Skip:  0 Todo: 0 Timeouts: 0.
#18 8.125 test_returning_errno alias tests/test-returning-errno-lwt-jobs/runtest (exit 1)
#18 8.125 (cd _build/default/tests/test-returning-errno-lwt-jobs && ./test_returning_errno.exe)
#18 8.125 double free or corruption (out)
#18 8.125 ...E

(I need to check if this also reproduces on the mainline ctypes, or if it's a bug specific to the build rules in this PR)

And oddly, a bug with a multiply defined symbols in Fedora 32 worked around by 96700a9886dfea4551192b2cc75de9d1e845b859. This doesn't happen in the mainline branch, so it might be some over-eager linkage in the test_functions/clib area in this PR.

avsm avatar Jul 26 '20 10:07 avsm

To broaden that, all the Lwt-based tests seem to fail in the PR at the moment, but only on Linux but not macOS.

avsm avatar Jul 26 '20 10:07 avsm

There's another build change just gone into master: #656.

yallop avatar Aug 05 '20 22:08 yallop

The dune-port branch breaks consumers of ctypes because the ctypes.cmxa lacks the -I $(ocamlc -where)/integers include path for C stubs. See ocamlobjinfo ctypes.cmxa|head from a dune build and from a make build. dune may just lack the required knobs to add the required info.

This breaks at least luv.

olafhering avatar Apr 21 '21 17:04 olafhering

Is the set of remaining points up-to-date?

bobot avatar Jan 26 '22 15:01 bobot

Thanks for looking @bobot! The main worrying thing about this PR is a test case failure (on the existing branch) on Linux for the Lwt tests, but only when running under dune. Something has changed in the compilation flags causing Lwt_preemptive to fail, and if that's figured out then the rest of the rebase to current ctypes should be straightforward. I never did get to the bottom of why that test case was failing (I spent some time at the Dune retreat looking at that!)

avsm avatar Jan 26 '22 18:01 avsm

Hi! I'm willing to give a hand too to give this a final push. I could repro that this test (lwt_preemptive + returning errno) hangs.

Initial debugging notes:

  • there are 4 ounit test cases in this file
  • none of them is individually responsible for the failure: passing -only-test ...:0 ,...:1, ...:2, ...:3 all succeed
  • passing -runner sequential to ounit make the test pass

So there's some kind of global state that's getting corrupted. I'll have a look later at what the hang is caused by.

emillon avatar Jan 31 '22 14:01 emillon

The stuck process is waiting on a read in ounit's IPC code:

#0  __read_chk (fd=fd@entry=8, buf=buf@entry=0x7ffe2a595900, nbytes=nbytes@entry=20,
    buflen=buflen@entry=65536) at read_chk.c:33
#1  0x0000562bb60f93cc in read (__nbytes=20, __buf=0x7ffe2a595900, __fd=8)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:39
#2  unix_read (fd=17, buf=<optimized out>, ofs=1, len=<optimized out>) at read.c:32
#3  0x0000562bb608bddf in camlUnix__read_478 () at unix.ml:258
#4  0x0000562bb608788a in camlOUnitRunnerProcesses__really_read_144 ()
    at src/lib/ounit2/advanced/oUnitRunnerProcesses.ml:71
#5  0x0000562bb6087959 in camlOUnitRunnerProcesses__receive_data_252 ()
    at src/lib/ounit2/advanced/oUnitRunnerProcesses.ml:90
#6  0x0000562bb608632a in camlOUnitRunner__loop_1367 () at src/lib/ounit2/advanced/oUnitRunner.ml:313
#7  0x0000562bb6087d59 in camlOUnitRunnerProcesses__create_worker_348 ()
    at src/lib/ounit2/advanced/oUnitRunnerProcesses.ml:160
#8  0x0000562bb6087014 in camlOUnitRunner__iter_1502 () at src/lib/ounit2/advanced/oUnitRunner.ml:526
#9  0x0000562bb6077a92 in camlOUnitUtils__time_fun_590 ()
    at src/lib/ounit2/advanced/oUnitUtils.ml:175
#10 0x0000562bb608a415 in camlOUnitCore__run_test_tt_192 ()
    at src/lib/ounit2/advanced/oUnitCore.ml:83
#11 0x0000562bb608a9b1 in camlOUnitCore__run_test_tt_main_inner_849 ()
    at src/lib/ounit2/advanced/oUnitCore.ml:173
#12 0x0000562bb603c081 in camlDune__exe__Test_returning_errno__entry ()
    at tests/test-returning-errno-lwt-preemptive/test_returning_errno.ml:85
#13 0x0000562bb6038b19 in caml_program ()
#14 0x0000562bb611fb0c in caml_start_program ()
#15 0x0000562bb60fd794 in caml_startup_common (argv=0x7ffe2a5a5ca8, pooling=<optimized out>,
    pooling@entry=0) at startup_nat.c:158
#16 0x0000562bb60fd7df in caml_startup_exn (argv=<optimized out>) at startup_nat.c:168
#17 caml_startup (argv=<optimized out>) at startup_nat.c:168
#18 0x0000562bb6038302 in main (argc=<optimized out>, argv=<optimized out>) at main.c:41

The generated code (stubs.c, struct_stubs.c, bindings.ml) look similar but the link flags might be different.

emillon avatar Jan 31 '22 15:01 emillon

ah I spoke a bit too quickly there are other processes stuck in some Lwt_preemptive code.

emillon avatar Jan 31 '22 15:01 emillon

OK so I have a workaround: adding the following line in test-returning-errno-lwt-preemptive/test_returning_errno.ml makes the deadlock go away :tada:

let () = OUnitRunnerProcesses.unix_fork := Lwt_unix.fork

(Alternatively, using ounit2-lwt instead of calling Lwt_main.run in the test)

I suppose that it's because lwt is then confused about the new process that it does not know about. But of course, that does not tell us why the original code was working...

emillon avatar Jan 31 '22 15:01 emillon

Great thanks, I haven't checked at all but my first hunch was that Dune use the mtpredicate (thread predicate). Perhaps that has an impact on a library that link differently based on that. It could explain the difference with fork since the code is perhaps different with threads.

bobot avatar Jan 31 '22 16:01 bobot

Yes, I think it's something like that at work. We'd have to inspect the linking commands to be sure. This is what the Lwt_unix.fork docs say:

fork () does the same as Unix.fork. You must use this function instead of Unix.fork when you want to use Lwt in the child process.

So it's always "wrong" (from lwt's point of view) to use ounit's process-based runner (the default one) when lwt is involved.

emillon avatar Feb 01 '22 09:02 emillon

You are right it is a real fix.

bobot avatar Feb 01 '22 09:02 bobot

The remaining question I guess is do we consider that this fix is enough or do we need to investigate more about why it worked before?

emillon avatar Feb 02 '22 13:02 emillon

It would be great to root-cause the underlying reason why behaviour changes. The ctypes Makefiles are rather precise at present, and I'm nervous about proposing a large-scale change to the build system that we don't fully understand.

I suspect this is down to the provision of the -thread flag somehow -- a problem that will go away in OCaml 5.0+ as it is mandatory, but ctypes still supports older versions of OCaml.

Thanks for all the very good detective work so far!

avsm avatar Feb 02 '22 15:02 avsm

OK, I'll keep the :male_detective: hat for a bit.

emillon avatar Feb 02 '22 15:02 emillon

Found it! The command that runs the test binaries actually pass -runner sequential:

https://github.com/ocamllabs/ocaml-ctypes/blob/57f069897b36f784ff0296c40f726e3baf5d8a1d/Makefile.tests#L1395

This is confirmed by execsnoop, and removing it makes some lwt-preemptive tests hang. So a more precise port of the test stanza would be (and similarly for other tests):

 (test
  (name test_returning_errno)
  (modules test_returning_errno)
  (package ctypes-foreign)
+ (action (run %{test} -runner sequential))
  (libraries oUnit ctypes ctypes.stubs ctypes-foreign
    test_returning_errno_lwt_preemptive_stubs test_functions
    test_returning_errno_lwt_preemptive_bindings tests_common))

emillon avatar Feb 02 '22 16:02 emillon

Fantastic!! Great detective work @emillon.

avsm avatar Feb 02 '22 16:02 avsm

Hi. I'm seeing some test hang in the CI, I think it's just a matter of putting -runner sequential in all tests.

emillon avatar Mar 02 '22 14:03 emillon

Sorry to jump in but what is the status of this PR? Is there anything I can do to help unblock this as merging this PR and releasing a new ctypes with this would immensely help opam-monorepo users?

NathanReb avatar Apr 04 '22 09:04 NathanReb

Hi, I worked on that to bring the port to completion. I pushed my branch to https://github.com/emillon/ocaml-ctypes/tree/dune-port-2022-04 My litmus test has been that ocaml-ci should work on the fork.

Here's a list of fixes:

  • this merges master so that it gets up to date (we'll probably want to linearize the history somehow) - in particular this merges the CI matrix from master.
  • format dune files
  • fix Stream deprecation for 4.14 support
  • fix flags passing - some places were missing :standard, and generally setting (use_standard_c_and_cxx_flags) makes us both future proof and fix the build on 32 bit switches
  • there was an issue with dynamic loading that prevented the test suite from running on ocaml < 4.06
  • add FTS tests to @runtest so that they're run in CI and in opam tests
  • use dune instrumentation support for bisect_ppx to avoid a dependency at the opam level (see details in commit messages)

I think that this takes care of most of your TODO list in the original message. It would be interesting to do a revdeps run. Let me know what next steps we can do.

emillon avatar Apr 14 '22 10:04 emillon