haskell.nix icon indicating copy to clipboard operation
haskell.nix copied to clipboard

remote-iserv: Cannot load dll when cross-compiling with mingwW64

Open ramirez7 opened this issue 4 years ago • 13 comments

I've been working to cross-compile some games that use SDL2 and related libraries (sdl-gpu, OpenGL, etc)

It's all been working fine. I've been able to make games that run with Wine and Windows proper!

However, I am running into this issue now that I am using TH, which in turn causes remote-iserv to come into play. TH fails with:

[ 9 of 13] Compiling SDL.Utils        ( src/SDL/Utils.hs, dist/build/SDL/Utils.o )
Could not load wine-gecko. HTML rendering will be disabled.
wine: configuration in '/build' has been updated.
Listening on port 9581
remote-iserv.exe: Could not load `SDL2.dll'. Reason: addDLL: SDL2.dll or dependencies not loaded. (Win32 error 126)

remote-iserv.exe: loadArchive "/nix/store/z3jrsabqrpx2adxygvjd78mvfdfqxlqk-SDL2-2.0.12-x86_64-w64-mingw32/lib/libSDL2.dll.a": failed
iserv-proxy: {handle: <socket: 5>}: GHCi.Message.remoteCall: end of file

<no location info>: error: ghc: ghc-iserv terminated (1)

FWIW, my SDL2 output in the Nix store looks like this. I did tweak the packaging a bit for mingw with an overlay.

$ tree /nix/store/z3jrsabqrpx2adxygvjd78mvfdfqxlqk-SDL2-2.0.12-x86_64-w64-mingw32/
/nix/store/z3jrsabqrpx2adxygvjd78mvfdfqxlqk-SDL2-2.0.12-x86_64-w64-mingw32/
├── bin
│   └── SDL2.dll
└── lib
    ├── libSDL2.dll.a
    ├── libSDL2main.a
    └── libSDL2_test.a

2 directories, 4 files

That said, outside of TH, my tweaks to SDL2 are working fine. Is there something that needs to be done for remote-iserv to get visibility of SDL2.dll? My cabal builds all rely on pkg-config. I looked into it, and Win32 error 126 usually means "a file related to the service is missing or cannot be found."

The code is here. I'm currently doing a weekend game jam, so for now I will try to hack a workaround where I dump splices on Linux and have the Windows build use those instead.

Aside from this issue, the Haskell x-compiling has been a godsend for me. I look forward to using it for bigger and better things! Thanks :)

ramirez7 avatar Apr 25 '21 06:04 ramirez7

https://gitlab.haskell.org/ghc/ghc/-/issues/18556

I found this ticket of ghc proper generating this same error

ramirez7 avatar Apr 25 '21 07:04 ramirez7

https://gitlab.com/macaroni.dev/gamedev-scratch/-/blob/c15e324e7c44ac2eb85f209e20851dbae4c32b75/ludum-dare/48/splice-for-windows.sh

My current workaround is to isolate my TH and structure the files in such a way that I can automatically munge the splices back in as valid Haskell. Luckily for me, the stuff I'm generating is simple enough that this transformation is easy. After running this script, I was able to x-compile fine :+1:


I also saw that GHC has some DEBUG_LOGs in LoadArchive.c but I'm not sure how to get them to appear.

ramirez7 avatar Apr 25 '21 15:04 ramirez7

@ramirez7 there is https://github.com/input-output-hk/haskell.nix/blob/ac38a412998e48e2ad5f069548d7a6cd5d822660/overlays/mingw_w64.nix#L62-L69 in haskell.nix, I'm still not sure why ghc doesn't really find the proper libs.

angerman avatar Apr 26 '21 08:04 angerman

@angerman I don't see where haskell.nix would be hooking the SDL stuff to iserv automatically.

Do you mean that I need to do something to add to setupBuildFlags with -L flags for SDL2 (-L${SDL2}/bin and -L{SDL2}/lib)? I'm not exactly sure what this thing is..is it just an attribute of any Haskell package that I can add to in an overlay?

I also found this Nix code that seems to use a hard-coded list of dlls.

ramirez7 avatar Sep 15 '21 19:09 ramirez7

Some ideas after sitting with this:

  • I could write an overlay for remote-iserv to do the same thing with SDL2's dlls as the hard-coded list of libraries. This seems like it would result in a lot of rebuilding though.
  • I think on Windows, dlls are also looked up based on PATH in addition to the executables directory? Maybe I can set the PATH to include SDL? That could be done just in packages that need it, so no need to rebuild every reverse dep of remote-iserv

--

Either way, I'll probably be doing another deep dive on mingw x-compilation in the coming weeks to prepare for LD49 in October, so hopefully I can figure this out then.

ramirez7 avatar Sep 16 '21 01:09 ramirez7

@angerman I caused this error to go away by moving/copying (both work) SDL2.dll to its derivation's lib directory from its bin directory. Makes sense - now remote-iserv can find it.

However, it is still broken. Now the entire build hangs with no activity. Both the Haskell compiler and remote-iserv are using no CPU and no IO as far as I can tell from htop and iotop.

It seems to happen at the first call to TH. I used -j1 and commented out TH methodically and each time, it was the first module compiled w/TH that caused the build to hang.

I don't see many other logs or info. Here's the relevant output that's interleaved in the build logs:

---> Starting remote-iserv on port 9127
---| remote-iserv should have started on 9127
Could not load wine-gecko. HTML rendering will be disabled.
wine: configuration in '/build' has been updated.
Listening on port 9127

I did not see any use of that port (9127) with lsof and the like. But I do see remote-iserv.exe running all the while. That feels like something failed silently and not completely in there?


I would like to fix this issue and I'm willing to dive pretty deep on it. So I have some questions to get me going:

  • How do I patch & use a different remote-iserv? I probably at least want to add some more logging.
    • Where is the code for it? Looks like it's in GHC's codebase?
    • How do I overlay a new remote-iserv with haskell.nix?
    • How do I change the remote-iserv for just one package? I'd imagine changing it will trigger a lot of rebuilding, so I'd like to just change it for my leaf package that is having issues, while leaving it untouched for upstream libraries.

ramirez7 avatar Sep 28 '21 00:09 ramirez7

I added verbose iserv logging and found that it got stuck on LoadDLL for SDL2.dll:

         iserv-proxy] Msg:      ghc -- proxy -> slave: FindSystemLibrary "mingw32.dll"
[         iserv-proxy] proxy/fwdCall: writing remote pipe
[         iserv-proxy] proxy/fwdCall: reading remote pipe
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: Nothing
[         iserv-proxy] Msg:      ghc -- proxy -> slave: FindSystemLibrary "SDL2main.dll"
[         iserv-proxy] proxy/fwdCall: writing remote pipe
[         iserv-proxy] proxy/fwdCall: reading remote pipe
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: Nothing
[         iserv-proxy] Msg:      ghc -- proxy -> slave: AddLibrarySearchPath "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin"
[         iserv-proxy] proxy/fwdCall: writing remote pipe
[         iserv-proxy] proxy/fwdCall: reading remote pipe
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: RemotePtr 102976
[         iserv-proxy] Msg:      ghc -- proxy -> slave: AddLibrarySearchPath "/nix/store/pn4r0z4k3810v1s36wkzn3z1y9vahlxa-sdl2-lib-sdl2-x86_64-w64-mingw32-2.5.3.1/lib/x86_64-windows-ghc-8.10.7/sdl2-2.5.3.1-3vIMN4iOTFR13Rfhby6YJd"
[         iserv-proxy] proxy/fwdCall: writing remote pipe
[         iserv-proxy] proxy/fwdCall: reading remote pipe
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: RemotePtr 103184
[         iserv-proxy] Msg:      ghc -- proxy -> slave: AddLibrarySearchPath "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/lib"
[         iserv-proxy] proxy/fwdCall: writing remote pipe
[         iserv-proxy] proxy/fwdCall: reading remote pipe
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: RemotePtr 104080
[         iserv-proxy] Msg:      ghc -- proxy -> slave: LoadDLL "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
[         iserv-proxy] fwdLoadCall: writing remote pipe
[         iserv-proxy] fwdLoadCall: reading remote pipe

ramirez7 avatar Mar 21 '22 03:03 ramirez7

does it really get stuck? Or does it just takes very very long?

angerman avatar Mar 21 '22 09:03 angerman

I'm fairly sure it's stuck. strace shows no activity + there's no CPU going on.

I added more verbose logging. It looks like remote-iserv.exe writes Nothing, but iserv-proxy never logs that it receives it (like it usually does). The logs suggest they're both reading, waiting on one another:

[         iserv-proxy] Msg:      ghc -- proxy -> slave: LoadDLL "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
[         iserv-proxy] fwdLoadCall: writing remote pipe
[         iserv-proxy] fwdLoadCall: reading remote pipe
[    remote-iserv.exe] discardCtrlC
[    remote-iserv.exe] msg: LoadDLL "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
[    remote-iserv.exe] writing pipe: Nothing
[    remote-iserv.exe] reading pipe...

I also don't see the Need DLL: log from remote-iserv.exe that seems like it should print for non-system DLLs:

  -- when loading DLLs (.so, .dylib, .dll, ...) and these are provided
  -- as relative paths, the intention is to load a pre-existing system library,
  -- therefore we hook the LoadDLL call only for absolute paths to ship the
  -- dll from the host to the target.  On windows we assume that we don't
  -- want to copy libraries that are referenced in C:\ these are usually
  -- system libraries.
  Msg (LoadDLL path@('C':':':_)) -> do
    return m
  Msg (LoadDLL path@('Z':':':_)) -> do
    return m
  Msg (LoadDLL path) | isAbsolute path -> do
    when verbose $ trace ("Need DLL: " ++ (base_path <//> path))
    handleLoad pipe path (base_path <//> path)
    return $ Msg (LoadDLL (base_path <//> path))
  _other -> return m

ramirez7 avatar Mar 21 '22 16:03 ramirez7

I've been reading the iserv source. I noticed a comment (that comes from haskell.nix patch iserv-cleanup-8.8.1.patch) that seems relevant:

+      -- Note [proxy-communication]
+      --
+      -- The fwdTHCall/fwdLoadCall/fwdCall's have to match up
+      -- with their endpoints in libiserv:Remote.Slave otherwise
+      -- you will end up with hung connections.
+      --
+      -- We are intercepting some calls between ghc and iserv
+      -- and augment the protocol here.  Thus these two sides
+      -- need to line up and know what request/reply to expect.

A "hung connection" sounds like what I'm dealing with! I wonder if there's some issue with this codepath. I did notice from my verbose logs building the world with my forked haskell.nix that this is the only time LoadDLL is invoked for iserv for a non-system path.

Specifically, it seems to try to load SDL2.dll when the TH depends on the sdl2 Haskell library. My codebase is one case. But I also see this when x-compiling the sdl2-ttf Haskell library. I'm guessing it's because TH needs to load the C library in order to run the sdl2 Haskell.

I did some digging into the source code and found a few things potentially related. I'm not at all familiar with the iserv code, so I may be way off though. (Click to expand)
Since this code is a combination of ghc source + haskell.nix patches, I was browing the `src` in the nix store for my project specifically. Here's what I found:
The two ends of the pipe are...

The reader in iserv-proxy corresponding to this line:

[         iserv-proxy] fwdLoadCall: reading remote pipe

...which comes from iserv-proxy/src/Main.hs:

      when verbose $ trace "fwdLoadCall: reading remote pipe"
      SlaveMsg msg' <- readPipe remote getSlaveMessage

The reader is expecting a SlaveMsg and is parsing it with getSlaveMessage from libiserv/src/Remote/Message.hs.

And the writer in remote-iserv.exe which corresponds to these lines:

[    remote-iserv.exe] msg: LoadDLL "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
[    remote-iserv.exe] writing pipe: Nothing`

...which I believe comes from this line in libiserv/src/Lib.hs's serv function:

    when verbose $ trace ("msg: " ++ (show msg))
    case msg of
      Shutdown -> return ()
      RunTH st q ty loc -> wrapRunTH $ runTH pipe st q ty loc
      RunModFinalizers st qrefs -> wrapRunTH $ runModFinalizerRefs pipe st qrefs
      _other -> run msg >>= reply

  reply :: forall a. (Binary a, Show a) => a -> IO ()
  reply r = do
    when verbose $ trace ("writing pipe: " ++ show r)
    writePipe pipe (put r)
    loop

The logs say it's writing a Nothing by way of GHCi.Run.run. I think it comes from GHCi.ObjLoader and is a Maybe String (Nothing means success per those comments):

loadDLL :: String -> IO (Maybe String)
-- Nothing      => success
-- Just err_msg => failure

This would mean the two ends of the pipe aren't on the same page. I'm not sure from the code why it hangs instead of throws, but the comment suggests it could cause hanging.

EDIT: One more thing. I see that the system LoadDLL calls do work and they are logging that they too are receiving Nothing responses. And they are successfully parsing & logging the response. Here's their logs:

[         iserv-proxy] proxy/fwdCall: reading remote pipe
[    remote-iserv.exe] reading pipe...
[    remote-iserv.exe] discardCtrlC
[    remote-iserv.exe] msg: LoadDLL "C:\\windows\\system32\\imm32.dll"
[    remote-iserv.exe] writing pipe: Nothing
[         iserv-proxy] Resp.:    ghc <- proxy -- slave: Nothing

The receiver is this function in iserv-proxy:

    fwdCall :: (Binary a, Show a) => Message a -> IO a
    fwdCall msg = do
      when verbose $ trace "proxy/fwdCall: writing remote pipe"
      writePipe remote (putMessage msg)
      when verbose $ trace "proxy/fwdCall: reading remote pipe"
      readPipe remote get

So instead of using getSlaveMessage as its Get in readPipe, it is just passing the generic get.

Looking at getSlaveMessage Get vs Maybe's Put:

getSlaveMessage :: Get SlaveMsg
getSlaveMessage = do
  b <- getWord8
  case b of
    0 -> SlaveMsg <$> (Have   <$> get <*> get)
    1 -> SlaveMsg <$> Missing <$> get
    2 -> return (SlaveMsg Done)

vs

instance (Binary a) => Binary (Maybe a) where
  put Nothing  = putWord8 0
  put (Just x) = putWord8 1 <> put x

Maybe this is the source of the hanging? The Nothing writes a 0 first, and then getSlaveMessage reads that 0 and then hangs waiting to parse the rest of the Have that's never going to come.


EDIT2: Okay, I think I've found the source of the mismatch. There's a codepath on the slave side that isn't getting triggered:

  -- when loading DLLs (.so, .dylib, .dll, ...) and these are provided
  -- as relative paths, the intention is to load a pre-existing system library,
  -- therefore we hook the LoadDLL call only for absolute paths to ship the
  -- dll from the host to the target.  On windows we assume that we don't
  -- want to copy libraries that are referenced in C:\ these are usually
  -- system libraries.
  Msg (LoadDLL path@('C':':':_)) -> do
    return m
  Msg (LoadDLL path@('Z':':':_)) -> do
    return m
  Msg (LoadDLL path) | isAbsolute path -> do
    when verbose $ trace ("Need DLL: " ++ (base_path <//> path))
    handleLoad pipe path (base_path <//> path)
    return $ Msg (LoadDLL (base_path <//> path))

That final branch would cause this to work I believe - handleLoad sends back the right type. The only explanation is that the nix store path (/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll) isn't being recognized as isAbsolute. I think it is due to the non-Windows-style path separators!

> import qualified System.FilePath.Windows as W
> import qualified System.FilePath.Posix as P
> W.isAbsolute "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
False
> P.isAbsolute "/nix/store/wqiafrccmdhclg6pgz4vk67ax6cybsmd-SDL2-x86_64-w64-mingw32-2.0.14/bin/SDL2.dll"
True

Since the offending program is running under wine, System.FilePath re-exports System.FilePath.Windows.

ramirez7 avatar Mar 21 '22 18:03 ramirez7

@angerman How can I make changes to these iserv programs used by haskell.nix? Between their being ghc libraries + the many haskell.nix patches, I can't figure it out.

I feel like if I could make & test changes easily, I could get a fix working for sure.

EDIT: Look at "EDIT2" in the previous comment (below the fold) - I am pretty sure I've figured out the issue. tl;dr the nix store path is not being recognized as an absolute path by remote-iserv.exe due to it using Windows path logic. I think replacing isAbsolute with System.FilePath.Posix.isAbsolute will fix my issue.

ramirez7 avatar Mar 21 '22 20:03 ramirez7

I've bumped my GHC to 9.2.2 and still hit this issue. I'm working around it with manual splicing.

I will try poking around and figure out how to make the fix in iserv itself. @angerman Is the source code for the iserv used by haskell.nix hosted anywhere? Or is it just a bunch of patches on top of upstream?

ramirez7 avatar May 24 '22 20:05 ramirez7

Thinking about this more, I think there's a better solution.

  1. iserv-proxy should be run with wine as well. Right now, it is run natively but remote-iserv is run with wine. The issue here is they both have a platform-dependent isAbsolutePath check. iserv-proxy can't be doing this check correctly unless it's being run on the target platform.
  2. With that fix, I can then fix my pkg-config files to use wine absolute paths - Z:\\nix\store\etc. I'm guessing remote-iserv running under wine will be able to handle those paths fine when loading DLLs. And then both programs will properly recognize them as absolute paths, and do the right thing.

Luckily, I can attempt this idea without modifying iserv sources. I can just continue to tweak the wrapper script. I will have to change which iserv-proxy is passed in as well. And I think this comment isn't necessarily true:

# iserv-proxy needs to come from the buildPackages, as it needs to run on the
# build host.

Maybe I'm missing the reason for this comment, but the fact that iserv-proxy is interpreting Windows paths using Linux logic seems off to me.

ramirez7 avatar Jul 25 '22 22:07 ramirez7

^ I tried this and it did not work (pkgconfig doesn't understand wine paths - understandably so since it isn't using wine :sweat_smile: )

I thought about it even more, and I think the fix might get tricky:

  • The proxy is doing the right thing at the moment. It recognizes Posix nix store paths as absolute and proxies them over.
  • The wine remote-iserv doesn't understand these paths as absolute. Adding logic to make it would result in a new error: Those Posix paths will be interpreted as Windows relative paths by wine, and we will fail to find the dlls.
  • So somewhere between the proxy and wine remote-iserv's DLL loading, we may need to do Posix -> Windows conversion of these paths. winepath is the official way to do this, but I think string munging and making assumptions about the Nix store being in the Z:\ drive could be fine.
  • I think the easiest place to put this will be in the wine remote-iserv. It will have to be changed to "understand" that the incoming paths are Posix paths (which they are) and then do the conversion itself.

I don't think I can fix this without an iserv patch, so I'm going to bite the bullet (and build a lot of GHCs) and try this soon.

ramirez7 avatar Sep 01 '22 23:09 ramirez7