medley icon indicating copy to clipboard operation
medley copied to clipboard

Running under cygwin

Open masinter opened this issue 5 years ago • 33 comments

Currently in maiko/cygwin2 branch:

  • Add to os_version *-*-cygwin*) echo cygwin ;;
  • copy makefile for Linux, add -DCYGWIN
  • ifdef out check for d_reclen because it isn't defined in cygwin (I suppose len== strlen(dp->d_name) would have been better (in dsk.c )
  • ifdef out fcntl in timer.c

now it compiles, but dies when run:

$ ./run-medley -full
sysout is /cygdrive/c/Users/Larry/home/ilisp/medley/loadups/xfull35.sysout
running /cygdrive/c/Users/Larry/home/ilisp/maiko/cygwin.x86_64/lde -g 1440x900 -sc 1440x900
start /cygdrive/c/Users/Larry/home/ilisp/medley/greetfiles/SIMPLE-INIT
Failed to find UNIXCOMM file handles; no processes
./run-medley: line 159:  2091 Segmentation fault      (core dumped) $prog $geometry $screensize -t "Medley Interlisp" $passthrough_args "$LDESRCESYSOUT"

masinter avatar Dec 22 '20 17:12 masinter

Submitted PR interlisp/maiko#128 to address the d_reclen issue.

waywardmonkeys avatar Dec 22 '20 18:12 waywardmonkeys

i spent some time with gdb documentation and stepping through main in gdb but couldn't get by the execvp Changes to compile in cygwin3 branch

masinter avatar Dec 24 '20 01:12 masinter

You appear to have merged your cygwin3 branch into master without telling anyone?

nbriggs avatar Dec 26 '20 19:12 nbriggs

See https://jeffkreeftmeijer.com/git-rebase/ for an explanation of what you probably wanted to do (git rebase, to re-apply your local branch changes over a copy of the updated master branch, without merging your experiment back into master)

nbriggs avatar Dec 26 '20 19:12 nbriggs

time for me to go back through the git tutorials

masinter avatar Dec 26 '20 19:12 masinter

The current state of this is that it crashes on master, right?

Do you have a stack trace?

waywardmonkeys avatar Dec 28 '20 06:12 waywardmonkeys

cygwin-gdb.txt

masinter avatar Dec 28 '20 07:12 masinter

So, that's stack corruption it seems ...

Looking at the code, we see something that seems dangerous:

* sys_size is sysout size in megabytes */
int sysout_loader(char * sysout_file_name, int sys_size)
{
  int sysout; /* SysoutFile descriptor */

  IFPAGE ifpage; /* IFPAGE */

  char *fptovp_scratch; /* scratch area for FPTOVP */
#ifdef BIGVM
  /* 1 "cell" per page for 256MB in 512 byte pages */
  unsigned int fptovp[0x80000]; /* FPTOVP */
#else
  DLword fptovp[0x10000]; /* FPTOVP */
#endif                /* BIGVM */
  long fptovp_offset; /* FPTOVP start offset */

That's quite a decently sized stack allocation there for FPTOVP.

You could try telling Cygwin to give it a bigger stack size via adding this to the gcc command line:

-Wl,--stack,33554432

Alternatively, you could malloc that and make sure it gets freed in the exit path (or just turn it into a malloc and leak it to see if that makes the crash go away...).

waywardmonkeys avatar Dec 28 '20 07:12 waywardmonkeys

Now it compiles ok and doesn't crash and runs the opcode tests in test.vm, but doesn't recognize keypress or mouse clicks (same problem as with running under WSL1)

masinter avatar Dec 28 '20 22:12 masinter

Which fix did you apply to prevent the crash?

waywardmonkeys avatar Dec 29 '20 02:12 waywardmonkeys

I take back what I said:

I think I just compiled it with gcc instead of clang. I wasn't sure now to apply either.

I also added -Wl,--stack,33554432 to the gcc line

masinter avatar Dec 29 '20 02:12 masinter

yes, fixes segv on startup. Now it's just hanging because it doesn't get mouse clicks or keystrokes

masinter avatar Feb 12 '21 21:02 masinter

Good news. Presumably that's after recompiling without the 32MB stack option. I'm trying to come up with a principled way of getting the X events processed.

nbriggs avatar Feb 12 '21 21:02 nbriggs

Steve reported having problems with running on WSL2 in his configuration -- screen comes up but no characters recognized. Is it possible to fork a little process to act as a keystroke forwarder that waits for the keyboard and sends events whenever it gets a keystroke or mouse move?

masinter avatar Apr 28 '21 03:04 masinter

We could use some documentation on what WSL2 (or cygwin) supports for polled or interrupt driven I/O, and then we can try to fix it. I don't have access to a WSL2 system (or one with cygwin), so there's not much I can do, or suggestions I can make to those who do, without some input...

nbriggs avatar Apr 30 '21 02:04 nbriggs

Hello,

cygwin apparently does not support asynchronous i/o or a virtual timer, as the callback routines int_io_service resp. int_timer_service in timer.c are never called, so for example when pressing the right mouse button, the menu appears but then the system is stuck. (networking does not work either, as incoming packets are not noticed)

As work-around in my dodo-nethub-support branch of Maiko, i added simple count-down mechanism to the instruction dispatcher (dispatch in xc.c): the compile variable CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN defines the interval as number of instructions for simulating an external i/o interrupt, so the interrupt checking machinery is fired in more or less regular intervals.

This variable is set in makefile-cygwin.x86_64-x (and only there) to 20000. To prevent compile errors, enabling async i/o in ether.c for the Nethub socket is made dependent of the presence of O_ASYNC (which is missing in Cygwin)

So i have now a working version of Maiko running natively on Windows10 as Cygwin executable which can do networking with Dodo XNS services and uses a local X11 server (VcXsrv 1.20.14.0). The Maiko process behaves normally in my understanding, the overall CPU load of the system is around 1-3% when the Medley environment is idle and goes up to ~15% (100% on about 1 out of 8 logical CPUs) when the right mouse button is pressed in Medley. So even if i/o is checked all 20.000 instructions, only minimal CPU resources are consumed when the Medley system is idle and only one CPU goes to 100% while a context menu is shown.

However networking is a bit "lethargic" compared to async i/o on Linux, but it is usable. Maybe using a lower value for CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN could make it more snappy, but this value must be adjusted to the local hardware anyway...

Greetings, Hans

devhawala avatar Oct 01 '22 21:10 devhawala

The configuration information that is specific to all instances of an OS like cygwin should be incorporated in the inc/maiko/platform.h -- we have carefully removed most of the system specific defines from the makefile fragments and I'd rather not move backwards there, if you don't mind.

Since there are likely to be other systems where simulating the async timer would be useful, I would suggest that the #define not be CYGWIN_... but rather MAIKO_EMULATE_TIMER_INTERRUPTS (or similar) and look at how to integrate it should other systems need it.

Oddly, cygwin defines FASYNC rather than O_ASYNC -- not sure what's going on there.

nbriggs avatar Oct 01 '22 21:10 nbriggs

Can you check if current cygwin supports ITIMER_REAL rather than ITIMER_VIRTUAL — I don’t have any Windows boxes to check this on, but chatter from about 17 years ago suggests that starting with cygwin version 1.5.16 it supported the realtime timer.

On Oct 1, 2022, at 2:13 PM, devhawala @.***> wrote:

Hello,

cygwin apparently does not support asynchronous i/o or a virtual timer, as the callback routines int_io_service resp. int_timer_service in timer.c are never called, so for example when pressing the right mouse button, the menu appears but then the system is stuck. (networking does not work either, as incoming packets are not noticed)

As work-around in my dodo-nethub-support https://github.com/devhawala/maiko/tree/dodo-nethub-support branch of Maiko, i added simple count-down mechanism to the instruction dispatcher (dispatch in xc.c): the compile variable CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN defines the interval as number of instructions for simulating an external i/o interrupt, so the interrupt checking machinery is fired in more or less regular intervals.

This variable is set in makefile-cygwin.x86_64-x (and only there) to 20000. To prevent compile errors, enabling async i/o in ether.c for the Nethub socket is made dependent of the presence of O_ASYNC (which is missing in Cygwin)

So i have now a working version of Maiko running natively on Windows10 as Cygwin executable which can do networking with Dodo XNS services and uses a local X11 server (VcXsrv 1.20.14.0). The Maiko process behaves normally in my understanding, the overall CPU load of the system is around 1-3% when the Medley environment is idle and goes up to ~15% (100% on about 1 out of 8 logical CPUs) when the right mouse button is pressed in Medley. So even if i/o is checked all 20.000 instructions, only minimal CPU resources are consumed when the Medley system is idle and only one CPU goes to 100% while a context menu is shown.

However networking is a bit "lethargic" compared to async i/o on Linux, but it is usable. Maybe using a lower value for CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN could make it more snappy, but this value must be adjusted to the local hardware anyway...

Greetings, Hans

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/103#issuecomment-1264482821, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6DAWMY5GHPKW3ARV4FFZLWBCSPDANCNFSM4VF54H7A. You are receiving this because you commented.

nbriggs avatar Oct 02 '22 00:10 nbriggs

Here some intermediate results:

  • setting the timer type to ITIMER_REAL for setitimer breaks the program startup: the connection to the Nethub is opened, the main X-window opens and the UI is shown, but then the programs dies very quickly without further information (no console output at all); i tried it with in normal mode as well as elevated mode (i.e. with or without admin rights, the Windows equivalent to running as root or with setuid), so no idea what's wrong
    so ITIMER_REAL cannot replace ITIMER_VIRTUAL (may have worked up to Windows-XP resp. before Windows-Vista)
  • FASYNC is present in the cygwin libs, but does not help, incoming TCP/IP packets keep being unnoticed

Meaning the work-around simulating timer and i/o interrupts will be needed for the cygwin version.

So next things will be incorporating the Addr68k/NativeAlignment{24} changes and moving cygwin-specific dependencies to platform.h

devhawala avatar Oct 02 '22 15:10 devhawala

I suspect that running with ITIMER_REAL may be causing an interrupt before the rest of the system is ready for it. I'll check out the situation with ITIMER_REAL on other systems and see if it's debuggable there.

I also notice that looks as though you're making the ether fd async before the async signal handling is set up -- it can't be made to generate interrupts before int_io_init() is called -- which in the USE_DLPI case of raw ethernet is where the I_SETSIG is applied to the ether fd.

nbriggs avatar Oct 02 '22 15:10 nbriggs

OK -- the issue is probably that if you change to ITIMER_REAL you have to have a signal handler for SIGALRM instead of a handler for SIGVTALRM... did you change that too?

nbriggs avatar Oct 02 '22 15:10 nbriggs

Yes... with ITIMER_REAL and SIGALRM it works the same on macOS as it does with ITIMER_VIRTUAL/SIGVTALRM.

nbriggs avatar Oct 02 '22 16:10 nbriggs

well ... not on cygwin: the active section of int_timer_init() is as follows in my local test version, but defining USE_ITIMER_REAL in platform.h for cygwin lets ldex abort the startup (not defining it works ok):

  struct itimerval timert;
  struct sigaction timer_action;

  timer_action.sa_handler = int_timer_service;
  sigemptyset(&timer_action.sa_mask);
  timer_action.sa_flags = SA_RESTART;
  
#ifdef USE_ITIMER_REAL
  int intAlarm = SIGALRM;
  int intTimer = ITIMER_REAL;
  printf("** using ITIMER_REAL for timer\n");
#else
  int intAlarm = SIGVTALRM;
  int intTimer = ITIMER_VIRTUAL;
#endif

  if (sigaction(intAlarm, &timer_action, NULL) == -1) {
    perror("sigaction: SIGVTALRM");
  }

  /* then attach a timer to it and turn it loose */
  timert.it_interval.tv_sec = timert.it_value.tv_sec = 0;
  timert.it_interval.tv_usec = timert.it_value.tv_usec = TIMER_INTERVAL;
  setitimer(intTimer, &timert, NULL);

devhawala avatar Oct 02 '22 16:10 devhawala

for "making the ether fd async": i removed that from connectToHub() as it also happens in the SUBRs ether_resume() and ether_ctrlr(), which both should be called after the Lisp system is up (and it still works of course)

devhawala avatar Oct 02 '22 16:10 devhawala

You changed all the instances?

timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:/*	Error handling routine for SIGALRM.  Called when any		*/
timer.c:/*	Set up the signal handler for SIGALRM, to catch TIMEOUTs:	*/
timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
unixfork.c:  sigaddset(&signals, SIGALRM);
unixfork.c:  sigaddset(&signals, SIGALRM);

nbriggs avatar Oct 02 '22 16:10 nbriggs

Inspecting the code, I believe the timer emulation code is going to fail if you compile with OPDISP.

nbriggs avatar Oct 02 '22 17:10 nbriggs

no, not all instances, but i think there was no need for:

  • timer.c / int_timer_init(): this is the code snippet in my previous post (ok, the perror() still says "SIGVTALRM", but it's only a log message)
  • timer.c / int_block(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here
  • timer.c / int_unblock(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here
  • timer.c / int_file_init(): sigaddset is only done for SIGALRM, so nothing changed here
  • unixfork.c / fork_Unix(): sigaddset is already done for both SIGVTALRM and SIGALRM, so nothing changed here

These were the only instances of the string "ALRM" that i found (except for comments or strings) in those files.

for the OPDISP: is this still used?
Scanning with grep, it seems that is was used for the sunos-variants (Makefiles in bin/legacy) and possibly with DOS, but it is not defined in platform.h or in any Makefiles in bin.

devhawala avatar Oct 02 '22 17:10 devhawala

OPDISP with the threaded dispatch is new. It improves the performance.

nbriggs avatar Oct 02 '22 17:10 nbriggs

OPDISP is not turned on by default yet, when we do turn it on, that could be done for systems other than Cygwin.

BTW -- where do you want bug reports for Dodo services submitted? The ECHO XIP packets that are returned by, for example, the file server implementation, don't match the echo packets it receives (missing the data, 2 bytes shorter than the one original sent)

nbriggs avatar Oct 02 '22 17:10 nbriggs

See the commit log for

commit b7e9529322269553d6c7bdd4741677270959cf09
Author: Bruce Mitchener <[email protected]>
Date:   Sat Jan 23 03:28:16 2021 +0700

We replaced the assembler fast dispatch with gcc's (and clang's) computed goto extension.

nbriggs avatar Oct 02 '22 17:10 nbriggs