flexlink produces an invalid dll when building lablgtk-2.18.3 on mingw64
Hi all,
I'm recently building lablgtk (a GTK2 wrapper for OCaml) using mingw64 toolchains provided by msys2. The package uses ocamlmklib (and thus flexlink) to create a dll library called dlllablgtk2.dll. Here are the version of the tools in my environment:
flexdll 0.34 (from http://alain.frisch.fr/flexdll.html; built from source)
ocaml 4.02.1 (built from source)
Flexlink generates the library without error, but the library is considered invalid by LoadLibraryEx:
Error: Error on dynamically loaded library: .\dlllablgtk2.dll: %1 is not a valid win32 application
The following toy program gives the same result.
$ cat testdll.c
#include <flexdll.h>
#include <stdio.h>
#include <windows.h>
int main(int argc, char *argv[]) {
void *handle;
printf("Try open: %s\n", argv[1]);
handle = flexdll_dlopen(argv[1], FLEXDLL_RTLD_GLOBAL);
printf("Handle: %p\n", handle);
if (handle == NULL) {
printf("Error code: %d\n", GetLastError());
printf("Error message: %s\n", flexdll_dlerror());
}
return 0;
}
$ flexlink -chain mingw64 -exe -o testdll testdll.c
$ testdll.exe dlllablgtk2.dll
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 193
Error message: %1 is not a valid win32 application
The library is created using 24 object files in addition to some system libraries. The command is:
flexlink -v -v -chain mingw64 -LD:/msys64/mingw64/x86_64-w64-mingw32/lib \
-o dlllablgtk2.dll -lpthread -LD:/msys64/mingw64/lib -lgtk-win32-2.0 \
-limm32 -lshell32 -lole32 -lpangocairo-1.0 -lpangoft2-1.0 -lpangowin32-1.0 -lgdi32 \
-lpango-1.0 -lm -latk-1.0 -lcairo -lpixman-1 -lfontconfig -lexpat -lfreetype -lexpat -lfreetype \
-lbz2 -lharfbuzz -lgdk_pixbuf-2.0 -lpng16 -lgio-2.0 -lz -lgmodule-2.0 -lgobject-2.0 -lffi \
-lglib-2.0 -lws2_32 -lole32 -lwinmm -lshlwapi -lintl \
ml_gobject.o ml_gpointer.o ml_gtk.o ml_gtkaction.o ml_gtkbin.o ml_gtkbroken.o ml_gtkbutton.o \
ml_gtkassistant.o ml_gtkedit.o ml_gtkfile.o ml_gtklist.o ml_gtkmenu.o ml_gtkmisc.o ml_gtkpack.o \
ml_gtkrange.o ml_gtkstock.o ml_gtktext.o ml_gtktree.o ml_gdkpixbuf.o ml_gdk.o ml_glib.o \
ml_pango.o ml_gvaluecaml.o wrappers.o
When I remove some of the objects (e.g. ml_gtktree.o), the generated library becomes valid.
$ testdll.exe dlllablgtk2.dll # ml_gtktree.o removed from the command
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 1114
Error message: Cannot resolve caml_failwith
It seems the issue is not raised by a single object. The library built without ml_gtktext.o (but with ml_gtktree.o) is also valid.
The binaries from https://github.com/shadinger/flexdll-win64 (version 0.26) does not suffer from this issue.
Here is the verbose log during linking.
** Use cygpath: true
** Search path:
D:/msys64/mingw64/lib
D:/msys64/mingw64/x86_64-w64-mingw32/lib
D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2
/mingw/lib
/mingw64/x86_64-w64-mingw32/lib
** Default libraries:
dllcrt2.o
-lmingw32
-lgcc
-lmoldname
-lmingwex
-lmsvcrt
-luser32
-lkernel32
-ladvapi32
-lshell32
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\dllcrt2.o
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingw32.a
** open: D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2\libgcc.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmoldname.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingwex.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmsvcrt.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libuser32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libkernel32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libadvapi32.a
+ x86_64-w64-mingw32-gcc -mconsole -shared -Wl,-eFlexDLLiniter -L. -I"D:/msys64/mingw64/lib" -I"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -L"D:/msys64/mingw64/lib" -L"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -o "test.dll" "D:\msys64\tmp\dyndll3ef3ef.o" "D:\msys64\mingw64\bin\flexdll_initer_mingw64.o" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libpthread.dll.a" "D:/msys64/mingw64/lib\libgtk-win32-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libimm32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshell32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libole32.a" "D:/msys64/mingw64/lib\libpangocairo-1.0.dll.a" "D:/msys64/mingw64/lib\libpangoft2-1.0.dll.a" "D:/msys64/mingw64/lib\libpangowin32-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libgdi32.a" "D:/msys64/mingw64/lib\libpango-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libm.a" "D:/msys64/mingw64/lib\libatk-1.0.dll.a" "D:/msys64/mingw64/lib\libcairo.dll.a" "D:/msys64/mingw64/lib\libpixman-1.dll.a" "D:/msys64/mingw64/lib\libfontconfig.dll.a" "D:/msys64/mingw64/lib\libexpat.dll.a" "D:/msys64/mingw64/lib\libfreetype.dll.a" "D:/msys64/mingw64/lib\libbz2.dll.a" "D:/msys64/mingw64/lib\libharfbuzz.dll.a" "D:/msys64/mingw64/lib\libgdk_pixbuf-2.0.dll.a" "D:/msys64/mingw64/lib\libpng16.dll.a" "D:/msys64/mingw64/lib\libgio-2.0.dll.a" "D:/msys64/mingw64/lib\libz.dll.a" "D:/msys64/mingw64/lib\libgmodule-2.0.dll.a" "D:/msys64/mingw64/lib\libgobject-2.0.dll.a" "D:/msys64/mingw64/lib\libffi.dll.a" "D:/msys64/mingw64/lib\libglib-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libws2_32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libwinmm.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshlwapi.a" "D:/msys64/mingw64/lib\libintl.dll.a" "D:\msys64\tmp\dyndll00be4c.o" "D:\msys64\tmp\dyndlle902c0.o" "D:\msys64\tmp\dyndll54d32d.o" "D:\msys64\tmp\dyndll2e0163.o" "ml_gtkbin.o" "D:\msys64\tmp\dyndll7ac0f6.o" "D:\msys64\tmp\dyndll3f46a1.o" "D:\msys64\tmp\dyndll6e7d00.o" "D:\msys64\tmp\dyndll709dae.o" "D:\msys64\tmp\dyndll4b5dee.o" "D:\msys64\tmp\dyndll027612.o" "D:\msys64\tmp\dyndll478b19.o" "D:\msys64\tmp\dyndll0fdffc.o" "D:\msys64\tmp\dyndll533488.o" "D:\msys64\tmp\dyndllc5412c.o" "D:\msys64\tmp\dyndllb81a8b.o" "D:\msys64\tmp\dyndll5f1731.o" "D:\msys64\tmp\dyndll4bc469.o" "D:\msys64\tmp\dyndlleed2db.o" "D:\msys64\tmp\dyndlla2929b.o" "D:\msys64\tmp\dyndll56c73c.o" "D:\msys64\tmp\dyndllde988f.o" "ml_gvaluecaml.o" "D:\msys64\tmp\dyndll049034.o" "D:\msys64\tmp\flexlink250fe6.def"
(call with bash: D:\msys64\tmp\longcmd233aa5)
Is it easy for you to test with OCaml trunk? The win64 backend has been changed to avoid problems when the DLL is loaded too far away in memory from the main process, and this might fix such issues.
I have tried the latest ocaml and camlp4 from the github mirror. The problem remains.
The issue seems to be related to the cygwin64 COMDATA hacks which are introduced in commit 37e6b5ad904b0d4648cebb09c19ed10e6f8dea28. The library works if the snippets are commented out.
Perhaps the current hacks for Cygwin64 should be restricted to cygwin64 indeed. Can you check which parts in the commit you refer to must be disabled (there are two fragments related to COMDATA sections -- do we need to disable both)?
Disabling the following fragment in add_reloc_table works in my case:
if sec.sec_opts &&& 0x1000l <> 0l && has_prefix ".rdata$.refptr." sec.sec_name then
begin
(* under Cygwin64, gcc introduces mergable (link once) COMDAT sections to store
indirection pointers to external darta symbols. Since we don't deal with such section
properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
sec.sec_opts <- 0xc0500040l;
sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
end;
This should be the first fragment mentioning COMDATA in the patch.
As reported by Andreas Hauptmann on the caml-list:
It either won't solve the issue or it will introduce new ones (I don't remember details, but I've tried it). As a temporary workaround, you can try to strip your invalid dll files (e.g. 'x86_64-w64-mingw32-strip --strip-unneeded dlllablgtk2.dll') or switch to an older version of the gcc-toolchain (4.8 or 4.7).
I'm having what appears to be a related issue trying to build lablgtk 2.18.5 with flexdll 0.35 and ocaml 4.02.3 on cygwin64:
ocamlmktop -I +lablGL -thread -o lablgtktop unix.cma threads.cma lablgl.cma
-I . lablgtk.cma lablgtkgl.cma lablglade.cma lablgnomecanvas.cma lablgnomeui.cma lablrsvg.cma lablgtkspell.cma lablgtksourceview2.cma gtkThread.cmo
File "none", line 1:
Error: Error on dynamically loaded library: ./dlllablgtk2.so: Exec format error
FWIW, stripping does help on Cygwin; I was able to get a successful and functional build by adding -ldopt -Wl,-s to the ocamlmklib -o lablgtk command.
Would it be possible to eventually fix this? This issue is hanging around for more than 2 years now. I just tried it with the source and binary delivieres version 0.35 as well as the current git master. This is a major source of build unreliabilities in the Windows builds of INRIA Coq. I currently use an explicit call to strip which magically fails as well if completely unrelated things in the build script are changed (like to which file messages are redirected). Why this is even procmon couldn't help me to understand. I will now instead try the method suggested above instead of the explicit call to strip.
But I would really appreciate a fix for this problem. If there is anything I can do to help, please let me know. E.g. I can send a script which sets up a fresh cygwin and reproduces the error with a single call to a batch file.
Best regards,
Michael
I'm afraid I don't understand the problem enough to fix it, and don't have the time and courage to investigate. If you could create a simple reproduction case that don't involve a bunch of external libraries, this would definitely make the problem easier to investigate. But the conclusion could also be that there is no easy fix.
I think my recommendation would be to avoid using flexlink with code not generated by OCaml compilers. For your use case, is it an option to link all native libraries statically in the main program?
Dear Alain,
you are right, maybe the best option is to patch the lablgtk build scripts such that they create just a static library and use this. I think for the whole lablgtk library there is no need to link it dynamically, since the GUI tool always needs it and in Coq there is only one GUI tool, so there wouldn't be DLL sharing either.
Also it is an interesting hint that the issues might come from the C code in lablgtk.
I will let you know how it goes along this path.
Best regards,
Michael
FYI this is the patch I used to work around this:
https://github.com/cygwinports/ocaml-lablgtk2/blob/master/2.18.5-flexlink.patch
I ran into this too, but doing my usual fumbling in the dark noticed that one of the fixes above involved reducing the number of .o files which in turn reduces the number of sections. That got me thinking that the name change removes the rdata$ prefix which is an instruction to the linker to merge the sections. So I tried this:
diff --git a/reloc.ml b/reloc.ml
index 358f6b9..823021d 100644
--- a/reloc.ml
+++ b/reloc.ml
@@ -434,7 +434,7 @@ let add_reloc_table obj obj_name p =
indirection pointers to external darta symbols. Since we don't deal with such section
properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
sec.sec_opts <- 0xc0500040l;
- sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
+ sec.sec_name <- Printf.sprintf ".flex$.flexrefptrsection%i" (Oo.id (object end));
end;
let min = ref Int32.max_int and max = ref Int32.min_int in
which appears to be enough to build a working lablgtk2 without having to strip the DLLs (one slight thing which concerned me with the stripping is that the resulting DLLs also crash Microsoft's objdump, though that may be objdump's fault, and it was on the Windows 7 SDK).
I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally? Is this all related to https://github.com/alainfrisch/flexdll/pull/52 and should we therefore be deleting these sections for symbols which flexdll is going to relocate and simply leaving it alone for any other symbols, which presumably the linker is going to deal with. It appears on a vague inspection that the linker will eliminate these in "normal" linking, so I'm guessing they just get folded into the normal relocation process?
Again, fumbling around trying to diagnose the original problem, I can't find a reference to the idea of a problem about having too many sections in the PE header (there's a reference to a limit of 96 for Windows XP but it's increased to 65536 since Vista, so that doesn't seem a likely candidate). Perhaps it's that these sections appear before one of the others and some offset becomes too big or larger than expected. Either way, stripping removes those sections and, on the basis that merging them also seems to fix the problem then it would appear to be the number of them which is the underlying issue.
But it still begs the question (from me at least) of what precisely they're for and what we should really be being done with them...
I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally?
I don't remember exactly, but keeping the COMDAT section resulted in some problems with cygwin64. (Perhaps because the section could be merged by the linker, and this breaks some assumptions made by the flexdll runtime.) I don't think this was related to #52.