OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Stop treating Cygwin as some kind of Windows variant

Open embray opened this issue 6 years ago • 11 comments

An unfortunate legacy brought over from GotoBlas is use of Windows APIs on Cygwin, and general treatment of Cygwin as just some variant of Windows. At one time that might have made some sense, especially in areas of multi-threading and other low-level system calls that were at one time not as well supported on Cygwin.

But I believe now, especially with the 64-bit Cygwin (the 32-bit Cygwin I'm less sure about), all system calls, including the the pthread API, mmap, etc., work well and reasonably efficiently on Cygwin.

While it is possible, and it works, to use Windows native APIs directly from code compiled for Cygwin, this is still best avoided as much as possible for at least a few reasons:

  1. Conceptually, Cygwin is never intended to be treated as some kind of Windows-variant. It's meant to be a fully POSIX-compliant platform, and mostly succeeds at this modulo a few small but necessary deviations. The underlying Windows OS should be treated as mostly opaque, except in some extreme cases, such as if it's necessary to work around a bug on Cygwin, or some relatively rare interface is not supported. AFAICT all POSIX features used by OpenBLAS are supported on Cygwin.

  2. Cygwin has its own implementation of fork(), which while rather complex and tricky does generally work. But it does not work well with code that depends on native Windows APIs which of course have no clue about fork(). I addressed this mostly in #1450, but there may be other cases like this.

  3. There are some cases already where we've avoided treating Cygwin as equivalent to Windows, but that leads to confusing and easy to mistake code like #if defined(OS_WINDOWS) && !defined(OS_CYGWIN_NT) or similarly #if !defined(OS_WINDOWS) || defined(OS_CYGWIN_NT). In other words, there are already several cases where Cygwin is treated as distinct from Windows.

There is just one area I know of where Cygwin should be treated similarly to Windows, and that's in the area of calling conventions. Code compiled with the Cygwin gcc toolchain uses Microsoft calling conventions for the target hardware by default: This is so that it can be fully compatible with Windows APIs.

What I would propose is to not define the OS_WINDOWS macro on Cygwin, generally allowing a clearer distinction between Windows and Cygwin. But do define the WINDOWS_ABI macro and use it where appropriate. I have a prototype branch where I've already done just that, but while it compiles it still has a number of problems, I think likely in the area of calling conventions or some other low-level detail, but I haven't investigated carefully yet.

embray avatar Feb 07 '19 14:02 embray

There are some compiler shortcomings around debug symbol generators shared by cygwin and more native mingw compilers. OpenBLAS uses native windows threads, and is still consumable by any cygwin program. The mere presence of posix APIs in SFU or cygwin is no reason to use exactly those in favour of already consumed native threads.

brada4 avatar Feb 07 '19 23:02 brada4

OpenBLAS uses native windows threads, and is still consumable by any cygwin program.

Yes, I know this well. That reminds me of another reason this is bad though, which I meant to list: When not using Cygwin functions you don't get proper signal handling either, which is particularly bad for threads started through the native Windows API, since Cygwin doesn't really have any knowledge about them.

This is especially annoying for debugging because it means gdb can't break on things like SIGSEGV since such signals are not properly delivered to threads. It also means that a Cygwin process with stuck threads often can't be killed without killing the Windows process directly. Some vague complaint about "shortcomings around debug symbol generators" aside, debugging on Cygwin is generally far worse (the only thing I'm aware of is that the rebase utility unfortunately does not handle DWARF sections properly, which is a general annoyance).

The mere presence of posix APIs

You say that like that's incidental, and not like, the entire point of Cygwin. Generally the Cygwin community is annoyed when projects don't treat Cygwin as a POSIX system which it is, and instead treat it like some kind of Windows, without very good motivation, since it really does lead to much harder to debug issues and other incompatibilities. This is just wrong.

embray avatar Feb 08 '19 11:02 embray

Err, I am not sure if we need a ticket for discussing the virtues of a particular operating system here ?

As far as I am concerned, any platform-specific #ifdefs in the code are there to make it possible to build and use OpenBLAS in such environments, and not to highlight their capabilities or shortcomings. Given the diverse user base, some of them are probably the result of "anecdotal" evidence of what worked for someone at a particular time and may not represent an ideal solution.

If there are better ways to handle Cygwin than by treating it as a special form of MS Windows, we can certainly discuss and merge any changes once they are known to work (though keeping compatibility with older versions of Cygwin would be desirable).

martin-frbg avatar Feb 08 '19 11:02 martin-frbg

though keeping compatibility with older versions of Cygwin would be desirable

I certainly agree, though only up to a point. One thing about Cygwin (which I actually don't like) is that it tends to have a forward-only march of progress. Which is to say, once a new version comes out it's actually not that easy to get previous, older versions. Which is not to say that people don't have older versions lying around.

What I might consider, once I gt my branch working, is setting a minimum Cygwin version for the new behavior based on where it's known to work, and fall back on the old behavior for older Cygwins. It should also be possible to twiddle some flags for this. Though my preference would be to do away with the old behavior entirely, since some of my original motivation was to simply the Cygwin support (do away with things like #if defined(OS_WINDOWS) && !defined(OS_CYGWIN_NT) and the like :/ It's a big tricky to juggle both cases but I could try to find a reasonably elegant way at least for the sake of backwards-compat.

embray avatar Feb 08 '19 13:02 embray

We will probably not need to stay compatible with Cygwin from 10+ years ago (i.e. GotoBLAS2 era), but some people may be stuck with a particular semi-recent version that they cannot upgrade (reliance on some other software on the system, not their computer, no admin rights, no experience with system setup or whatever).

martin-frbg avatar Feb 08 '19 13:02 martin-frbg

Windows 7 was released in 2009, that makes exactly 10+1 soon. It is much easier to keep it working as it is used to than blame on careless & ignorant.

brada4 avatar Feb 08 '19 16:02 brada4

There is no "blame" here, just pointing out the fact that this is the wrong way to use and support Cygwin. It might have made perfect sense to do it that way at the time but if it were done now there would be no good reason.

Take it from people who actually use this platform and use openblas on it: I would suggest either supporting the platform correctly or not at all. I'm working on the patch so there should be no complaint.

embray avatar Feb 12 '19 10:02 embray

One still might want to cross-build native DLL with cygwin for all other uses outside it (CC=mingw... FC=...)

brada4 avatar Feb 12 '19 17:02 brada4

Then...you shouldn't set OS_CYGWIN_NT at all.

embray avatar Feb 12 '19 18:02 embray

What I mean if you change cygwin default to more native compilation, please maintain its current ability to cross-build cygwin-less mingw dll, it is quite good drop-in replacement for archaic blas in archaic software.

brada4 avatar Feb 12 '19 20:02 brada4

Another subtle reason not to just treat Cygwin as Windows, which just cost me several hours of headscratching: When a process is forked in Cygwin it doesn't necessarily copy all its environment variables--particularly those that are not meaningful to Windows itself (like PATH) into the new process's Windows-level environment block.

The result is that when OpenBLAS, which with OS_WINDOWS defined uses the Windows-native GetEnvironmentVariable to read environment variables, is reinitialized in a forked process variables like OPENBLAS_NUM_THREADS are not processed correctly.

This specific case is obviously easy to address directly; I'll send a patch shortly. I feel like this entire issue while probably be fixed in piecemeal fashion, which is probably better than trying to do it all at once anyways.

embray avatar Mar 15 '19 14:03 embray