elks icon indicating copy to clipboard operation
elks copied to clipboard

panic: No init or sh found (again)

Open Vutshi opened this issue 2 years ago • 60 comments

Description Hi, I've tried ELKS 0.5 and 0.6 on my 8088 with 256KB of RAM. It is a Soviet clone of the actual 8088 CPU named К1810ВМ88. I observed a problem similar to the one mentioned in https://github.com/jbruchon/elks/issues/288.

Computer description

  • The computer is called МК-88.01 where .01 denotes a variant with 256K RAM and FDD
  • CPU: КР1810ВМ88 is a clone of Intel 8088 at frequency 4.77MHz
  • Video RAM 128K
  • Programmable interval timer: КР580ВИ53 equivalent to Intel 8253. Max frequency 2MHz
  • Programmable interrupt controller: КР1810ВН59А equivalent to Intel 8259
  • Floppy disk controller: UMC UM8272A analog of Intel 8272
  • Controller for cassette recorder: КР580ВВ55А analog of Intel 8255
  • There is one floppy drive installed (720K) which is used as 360K There is LPT and Joystick.
  • No HDD, no COM port, no Soundcard.

Configuration

  • I tried precompiled ELKS 0.5 with xms fix
  • Also I compiled the latest master version (75a7cb7) myself (precompiled 0.6 didn't even start booting)

Raw data In both cases I get the same result shown in screenshots:

  • 0.6 master: signal-2022-06-17-044632_002 signal-2022-06-17-044632_004
  • 0.5 with xms fix signal-2022-06-16-184517_001

P.S. 0.6 master boots nicely in qemu. Best

Vutshi avatar Jun 17 '22 10:06 Vutshi

Hello @Vutshi,

Thanks for the problem report. I think the problem is your machine doesn't have enough memory for the default distribution, with only 256K RAM. As can be seen from the boot log, the system only has 123K free RAM for application programs after the kernel is loaded.

It is a Soviet clone of the actual 8088 CPU named К1810ВМ88.

Interesting. Was this designed in Russia from just the Intel specs?

Looking at the screenshots, and the differences between the v0.5.0 and v0.6.0 boots, I'm not sure why the BIOS track read retry is occurring, we might want to set CONFIG_TRACK_CACHE off, although that will make the system slower. I also can't understand why the /dev/console open is failing, which occurs prior to trying to exec /bin/init. This could happen if the FAT disk has a physical /dev directory, as /dev/ is emulated on FAT.

To allow the system to run with 256K RAM, we need to lower the max kernel heap, which defaults to 64K. That will allow more RAM for a shell to run. We will also want to disable /bin/init, and force a small shell (/bin/sash) for the time being.

To do this, add the following line in elks/include/linuxmt/config.h (at the top will do):

#define SETUP_HEAPSIZE		4096	/* force kernel heap size if specified*/

Also, change the number of external buffers from 64 to 32 in .config using "CONFIG_FS_NR_BUFFERS=32".

Then, after recompiling the kernel using 'make kclean' and producing another 360K boot floppy, remove /bin/init and /bin/sh, and copy elkscmd/sash/sash to /bin/sh on the floppy. (The standalone shell requires less space, however sash must be manually copied to /bin/sh as the 360K floppy does not contain it by the default build).

[EDIT: I have simulated 256K RAM on ELKS on QEMU by modifying setup.S in arch_get_mem and have ELKS booting, using the above modifications. It looks like we might need a better way to build a floppy that does not use /bin/init and uses sash for /bin/sh. Trying to run with /bin/sh uses too much memory on 256K system. The system remains very right on RAM and can't run other applications though. We have a ROM version that works, but more modifications will be required in order to get the system RAM usage to work with 256k.]

Thank you!

ghaerr avatar Jun 17 '22 16:06 ghaerr

Thank you @ghaerr for the detailed instructions. Now there is more free RAM but something is still not working. signal-2022-06-17-202022_002

Here is my .config

Is there anything else to be safely removed from the image?

Vutshi avatar Jun 17 '22 18:06 Vutshi

Is there anything else to be safely removed from the image?

It appears that you may not have /bin/sash copied as /bin/sh on the floppy. I also can't duplicate the Can't open /dev/console error. Can you post a DIR (or ls -l) listing of your root directory and /bin of the floppy?

ghaerr avatar Jun 17 '22 18:06 ghaerr

Your config also has the EXT buffers set to 64, against the recommendation above:

CONFIG_FS_NR_EXT_BUFFERS=64

Please re-read and check the above instructions carefully.

ghaerr avatar Jun 17 '22 18:06 ghaerr

I fixed the external buffer part, now it reads

CONFIG_FS_NR_EXT_BUFFERS=32

Initially you were talking about "CONFIG_FS_NR_BUFFERS=32" is it a typo or these are some other buffers?

It appears that you may not have /bin/sash copied as /bin/sh on the floppy. I also can't duplicate the Can't open /dev/console error. Can you post a DIR (or ls -l) listing of your root directory and /bin of the floppy?

I definitely use sash. It is clearly working in qemu.

Just in case here are my latest config and the corresponding image with sash config_6.txt fd360-6_small.img.zip

The outcome on the hardware is still the same. dev/console doesn't open signal-2022-06-17-220053_001 signal-2022-06-17-220053_002

Vutshi avatar Jun 17 '22 20:06 Vutshi

I definitely use sash. It is clearly working in qemu.

Thanks for the image. I tried it on QEMU and it works, but the likely reason is that QEMU has 1M memory, and ELKS is reporting 520K RAM free. Here is your image running on QEMU: Screen Shot 2022-06-17 at 2 26 11 PM

I am working on a PR that will allow us to artificially limit the amount of RAM available to ELKS. I have that running, but still can't duplicate the Can't open console issue. I am hoping this has nothing to do with the clone CPU.

ghaerr avatar Jun 17 '22 20:06 ghaerr

I have to say, this is quite strange. Is there a way you could build a MINIX image, instead of FAT (CONFIG_IMG_MINIX=y CONFIG_IMG_FAT not defined), so that we can see whether the failure of opening /dev/console has anything to do with the emulated FAT filesystem?

We are dealing with multiple problems and I'm trying to get my arms around a good debug scenario. We can't do much printk kernel debugging, since the output scrolls off after 24 lines. I am still guessing this has to do with limited RAM, but I now have all this running on QEMU with 256K limit and I still can't duplicate the real hardware open problem (which still should not affect the exec of /bin/sh, frankly).

Since your system may not be IBM compatible, we might want to eliminate any other kernel dependencies by setting the following in the CONFIG_ARCH_IBMPC section of include/linuxmt/config.h:

#define SYS_CAPS                0       /* no XT/AT capabilities */

Are there other ways the hardware you are running on may be different from IBM PC?

ghaerr avatar Jun 17 '22 20:06 ghaerr

MY 2cents: Seems to me that after boot, elks is requesting 2 blks per read but getting only one. Bios problem? M

[ iPhone ]

  1. jun. 2022 kl. 22:30 skrev Gregory Haerr @.***>:

 I definitely use sash. It is clearly working in qemu.

Thanks for the image. I tried it on QEMU and it works, but the likely reason is that QEMU has 1M memory, and ELKS is reporting 520K RAM free. Here is your image running on QEMU:

I am working on a PR that will allow us to artificially limit the amount of RAM available to ELKS. I have that running, but still can't duplicate the Can't open console issue. I am hoping this has nothing to do with the clone CPU.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

Mellvik avatar Jun 17 '22 21:06 Mellvik

Interesting. Was this designed in Russia from just the Intel specs?

I am not sure how exactly was it designed. I suspect it was reverse engineered from Intel CPU. As far as I know USSR pursued both strategies: cloning western devices (which come to an end in 90s) and building homegrown VLIW architecture (exists now as Elbrus 8S, 8SV, 16S).

Regarding the 8088 clone I am using I should add that it is quite shabby. One memory module had to be replaced, floppy connector replaced, there are some glitches in video ram (will be replaced too). Nevertheless, it boots the native OS (analog of MS-DOS called Альфа ДОС) and runs games like pacman (with small graphics glitches).

So one should not exclude a hardware problem behind the ELKS booting issue.

Vutshi avatar Jun 17 '22 21:06 Vutshi

Looking further @Vutshi, I am finding a pretty stupid problem in ELKS .config files. It seems that merely commenting out options with '#', but leaving the value =y, causes the include/autoconf.h to be produced incorrectly. I noticed this because in your boot screen, it is saying "track read retry", yet you have CONFIG_TRACK_CACHE commented out (which isn't working):

# CONFIG_TRACK_CACHE=y
# CONFIG_BLK_DEV_BHD=y
# CONFIG_IDE_PROBE=y

The above doesn't work, look at what include/autoconf.h has to say:

#define CONFIG_TRACK_CACHE 1
#define CONFIG_BLK_DEV_BHD 1
#define CONFIG_IDE_PROBE 1

In order to correct you MUST set the above .config values to "# CONFIG_TRACK_CACHE is not used". I have just confirmed this by looking at the ELKS menuconfig/config and it actually string compares "is not used". UGH!! So this may be a contributing issue, we are not actually creating what is configured!!!

A potential fix for this is to either edit back manually, if "make menuconfig" doesn't do it automatically. Thus, we need to use a .config file with no commented-out "=y" values!

ghaerr avatar Jun 17 '22 21:06 ghaerr

@ghaerr

Are there other ways the hardware you are running on may be different from IBM PC?

I don't really know. It is supposed to be identical. It definitely runs MS-DOS and games made for MS-DOS.

Vutshi avatar Jun 17 '22 21:06 Vutshi

Here's a fixed .config file, who knows whether this might change something on your real hardware! config.txt.zip

ghaerr avatar Jun 17 '22 21:06 ghaerr

Is there a way you could build a MINIX image, instead of FAT

This is what I wanted to try as well. The only reason I didn't do it so far is some Windows related problem with writing this image. I guess I need to install Ubuntu for writing images.

Vutshi avatar Jun 17 '22 21:06 Vutshi

The only reason I didn't do it so far is some Windows related problem with writing this image.

That could also be an issue. We're finding out all sorts of strange things on your issue...

I guess I need to install Ubuntu for writing images.

That'd be great, since we need to eliminate variables. Given @Mellvik's comment about BIOS, and my finding that track read was turned on even though configured off, a BIOS issue with multi-sector reads could also be an issue.

I'm still working on better debug for very limited RAM systems, as that happens to be an interest of mine. I'd like to see us getting this working :) I hope to push a PR to help emulate 256K better on QEMU.

ghaerr avatar Jun 17 '22 21:06 ghaerr

Here's a fixed .config file, who knows whether this might change something on your real hardware!

Thanks. I'll try it.

Vutshi avatar Jun 17 '22 21:06 Vutshi

Is there a difference between

Yes (although there definitely should not be!!) - that's what my last post was saying. Unfortunately, the make config/menuconfig scripts are pretty dumb. I showed you what include/autoconf.h looked like, which incorrectly set the settings for the C code.

ghaerr avatar Jun 17 '22 21:06 ghaerr

Is there a difference between

Yes (although there definitely should not be!!) - that's what my last post was saying. Unfortunately, the make config/menuconfig scripts are pretty dumb. I showed you what include/autoconf.h looked like, which incorrectly set the settings for the C code.

Yes. Sorry, I missed your explanation above. Messages appear faster than I write and read :)

Vutshi avatar Jun 17 '22 21:06 Vutshi

@ghaerr Is there a way to put sash into minix image if my system doesn't understand minix filesystem?

I will be back tomorrow with new tests on hardware.

Vutshi avatar Jun 17 '22 22:06 Vutshi

Is there a way to put sash into minix image if my system doesn't understand minix filesystem?

Currently, not an easy way. I'm working on a new option to copy sash to /bin/sh on build, as well as turn off automatic execution of /bin/init. There will also be options that allow us to emulate 256K in QEMU, so that we can debug all this lots easier. I'll post a PR shortly.

ghaerr avatar Jun 17 '22 22:06 ghaerr

Actually, there is a way to do this - however clumsy.

What I've done in such cases is to boot the image in QEMU and have whatever files I need to add available on a second floppy image, mounted after boot. Then manipulate them in QEMU (and remember to sync before exit).

--M

  1. jun. 2022 kl. 00:06 skrev Gregory Haerr @.***>:

 Is there a way to put sash into minix image if my system doesn't understand minix filesystem?

Currently, not an easy way. I'm working on a new option to copy sash to /bin/sh on build, as well as turn off automatic execution of /bin/init. There will also be options that allow us to emulate 256K in QEMU, so that we can debug all this lots easier. I'll post a PR shortly.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Mellvik avatar Jun 18 '22 07:06 Mellvik

Hi @ghaerr and everybody,

I've got new results with the latest PR #1330 and the corresponding config file.

First of all I checked compatibility of my machine with x86 by booting MS-DOS 3.10. Works. DOS games work as well. signal-2022-06-18-085047_001

I wrote ELKS image in Ubuntu as follows dd if=fd360-minix.img of=/dev/fd0 bs=2048

minix version: signal-2022-06-18-195119_001

fat version: signal-2022-06-18-200015_001

Vutshi avatar Jun 18 '22 18:06 Vutshi

Hello @Vutshi,

Thanks for the screenshots and continued testing with the latest changes.

Looking at both screens, and seeing the results on MINIX (with a different errno). I think I see the problem: the floppy disk probe seems to be determine that your floppy has the format of 40 cylinders, 2 heads and 8 sectors. The 8 sectors is incorrect for 360K floppy! So what's happening is ELKS is reading the disks skipping a sector every 8 sectors, which is why nothing is running. The kernel is loaded using different code, which appears to be working.

Let me look at the probe code to determine why this might be happening.

Here is the code after probing in elks/arch/i86/drivers/block/bioshd.c:

      got_geom:

        if (drivep->cylinders == 0 || drivep->sectors == 0) {
            *drivep = fd_types[drivep->fdtype];
            printk("fd: Floppy drive autoprobe failed!\n");
        } else {
drivep->sectors = 9; // <--- INSERT THIS LINE
            printk("fd: /dev/fd%d %s has %d cylinders, %d heads, and %d sectors\n",
                   target,
                   (found_PB == 2)? "DOS format," :
                   (found_PB == 1)? "ELKS bootable,": "probed, probably",
                   drivep->cylinders, drivep->heads, drivep->sectors);

        }

If you'd like to see if my theory is correct, insert the above line in the driver, and recompile. That should force the floppy sector count to be 9, and everything should work.

Thank you!

ghaerr avatar Jun 18 '22 18:06 ghaerr

I looked at the probe code, and can't see much wrong with it (yet). Is your floppy a 320K floppy? (CHS 40,2,8)? That isn't supported by ELKS, although we could. It seems that perhaps your floppy might be 320k but configured for 360k ELKS?

ghaerr avatar Jun 18 '22 18:06 ghaerr

@ghaerr it works! After hardcoding number of sectors I got it booted: signal-2022-06-18-215425_001

Super progress, thank you!

Now the problem is that system doesn't respond to pressing keys on my keyboard :)

Vutshi avatar Jun 18 '22 20:06 Vutshi

I looked at the probe code, and can't see much wrong with it (yet). Is your floppy a 320K floppy? (CHS 40,2,8)? That isn't supported by ELKS, although we could. It seems that perhaps your floppy might be 320k but configured for 360k ELKS?

The floppy drive is physically 720K with 80 cylinders. However, the computer reads only every second cylinder thus effectively using it as 360K. Plus it is not in a very good shape, I have to press the drive head with my finger to help it read thoroughly:)) signal-2022-06-18-220836_002

Vutshi avatar Jun 18 '22 20:06 Vutshi

Now the problem is that system doesn't respond to pressing keys on my keyboard :)

Try setting "BIOS" under "Select Console Driver", that will use polled BIOS rather than IRQ 1 for keyboard.

I have to press the drive head with my finger to help it read thoroughly:))

Well, that's a new take on "having the system at your fingertips..." :)

I'm not sure whether the kernel probe routine may not be working (I did modify it a bit several months ago), or whether this has something to do with your drive. I will continue looking at this. We probably ought to add some printk code in the actual probe routine, to see what it's doing, perhaps something like the following:

/* Next, probe for sector number. We probe on track 0, which is
 * safe for all formats, and if we get a seek error, we assume that
 * the previous successfully probed format is the correct one.
 */

        drivep->sectors = 0;
        count = 0;
        do {
            /* skip reading first entry */
printk("probe %d\n", count); // <--- insert this line
            if (count && read_sector(target, 0, sector_probe[count]))
                { printk("read sector failed on %d count %d\n", sector_probe[count], count); break; } // <--- change this line
            drivep->sectors = sector_probe[count];
        } while (++count < sizeof(sector_probe)/sizeof(sector_probe[0]));

However, the computer reads only every second cylinder thus effectively using it as 360K.

Hmmm, the probe routine tries on track 0. Are you saying that perhaps your drive only starts working on track 1? If probing on track 0 sector 8 failed, that would cause this problem.

How does the BIOS handle the every other track, does it automatically map track 0 -> 1, 1 -> 3, 2 -> 5 etc?

ghaerr avatar Jun 18 '22 23:06 ghaerr

@ghaerr

How does the BIOS handle the every other track, does it automatically map track 0 -> 1, 1 -> 3, 2 -> 5 etc?

I don't really know how it works. Documentation I have doesn't say anything about this subtlety. Although, I do have ROMs data available if it can help.

The additional printk code gives the following output now: signal-2022-06-19-100257_001

Somehow it doesn't like sector 9.

Regarding keyboard:

Try setting "BIOS" under "Select Console Driver", that will use polled BIOS rather than IRQ 1 for keyboard.

This setting broke compilation for me:

kbd-scancode.c:136:5: error: ‘xtkb_scan’ undeclared here (not in a function)
     xtkb_scan,  /*mode = 0*/
     ^~~~~~~~~
kbd-scancode.c:137:5: error: ‘xtkb_scan_shifted’ undeclared here (not in a function)
     xtkb_scan_shifted, /*mode = 1*/
     ^~~~~~~~~~~~~~~~~
kbd-scancode.c:138:5: error: ‘xtkb_scan_caps’ undeclared here (not in a function)
     xtkb_scan_caps, /*mode = 2*/
     ^~~~~~~~~~~~~~
kbd-scancode.c:139:5: error: ‘xtkb_scan_ctrl_alt’ undeclared here (not in a function)
     xtkb_scan_ctrl_alt, /*mode = 3*/
     ^~~~~~~~~~~~~~~~~~
make[3]: *** [../../../../Makefile-rules:243: kbd-scancode.o] Error 1
make[3]: Leaving directory '/home/denis/8088/elks/elks/arch/i86/drivers/char'
make[2]: *** [Makefile:197: drivers/char/chr_drv.a] Error 2
make[2]: Leaving directory '/home/denis/8088/elks/elks/arch/i86'
make[1]: *** [Makefile:75: Image] Error 2
make[1]: Leaving directory '/home/denis/8088/elks/elks'
make: *** [Makefile:13: all] Error 2

Vutshi avatar Jun 19 '22 11:06 Vutshi

Hello @Vutshi,

This setting broke compilation for me: kbd-scancode.c:136:5: error: ‘xtkb_scan’ undeclared here (not in a function) xtkb_scan, /mode = 0/

This is a result of having "Scancode keyboard driver" (CONFIG_KEYBOARD_SCANCODE=y) set for BIOS keyboard driver, which won't work. Turn that off using make menuconfig and you should get a compiled system.

I'll submit a fix to remove the scancode keyboard driver when BIOS console is selected.

The additional printk code gives the following output now: Somehow it doesn't like sector 9.

Thanks for testing that. The printk display seems to show that the probe routine is in fact correct, while somehow the BIOS isn't reading track 0 sector 9 (in the probe routine only?). Very strange. I'm not quite certain that it isn't our probe routine, but I suspect this has something to do with your non-standard every-other-track floppy drive somehow. We can just leave that alone for now, until we get ELKS into a fully operational state on your system.

Thank you!

ghaerr avatar Jun 19 '22 16:06 ghaerr

Hi @ghaerr,

This is a result of having "Scancode keyboard driver" (CONFIG_KEYBOARD_SCANCODE=y) set for BIOS keyboard driver, which won't work. Turn that off using make menuconfig and you should get a compiled system.

I'll try it in a couple of hours.

Meanwhile I have a suspicion that disabling the new options

CONFIG_SYS_DEFSHELL_SASH
CONFIG_SYS_NO_BININIT

does not restore the original sh for me. I think it is still sash, it has the same size and behaves accordingly.

Vutshi avatar Jun 19 '22 18:06 Vutshi

Meanwhile I have a suspicion that disabling the new options CONFIG_SYS_DEFSHELL_SASH does not restore the original sh

You're right - fixed in #1337.

ghaerr avatar Jun 19 '22 19:06 ghaerr

@ghaerr bios keyboard doesn't bring luck for me: signal-2022-06-19-212210_001 It doesn't respond to a single key in English or Russian register

Vutshi avatar Jun 19 '22 19:06 Vutshi

bios keyboard doesn't bring luck for me:bios keyboard doesn't bring luck for me It doesn't respond to a single key in English or Russian register

It seems perhaps we need to discuss further more exactly how your system BIOS and hardware differs from standard PC. The 360k/720k floppy drive appears to be functioning differently than PC in some respects, requiring the forced sector = 9 workaround; I suspect something is also amiss with keyboard.

The BIOS keyboard driver uses the standard IBM PC INT 16h function AH=0 and AH=1 to read keystrokes, which aren't working. The Direct console uses IRQ 1, which didn't work either. I don't know what to do, without looking further at BIOS source or documentation. You can look at elks/arch/i86/drivers/char/conio-bios.S, it will likely have to be changed to support whatever method your BIOS requires, it seems?

// int conio_poll
// INT 16h AH=00h (read kbd)
// INT 16h AH=01h (get kbd status)
// returns scan code in AH, ASCII char in AL

conio_poll:
        mov    $1,%ah           // get kbd status
        int    $0x16
        jnz    1f               // key pressed
        xor    %ax,%ax
1:      or     %ax,%ax
        jz     9f
        xor    %ah,%ah          // read kbd scan/char
        int    $0x16
9:      ret

Obviously, console output is working, this uses INT 10h function 0x0E.

ghaerr avatar Jun 19 '22 19:06 ghaerr

I see. I'll try to dig available documentation.

P.S. PC-DOS 3.30 is somehow familiar with the keyboard: signal-2022-06-19-230408

Vutshi avatar Jun 19 '22 20:06 Vutshi

Hi @ghaerr, I have some new data regarding the keyboard problem. A friend of mine wrote me a simple test in asm (see below) to check INT 16h. The program interrogates the keyboard in a loop and prints what was typed in. It stops after pressing q.

        .text
        .code16
0:      mov     $1,%ah
        int     $0x16
        jnz     1f
        xor     %ax,%ax
1:      or      %ax,%ax
        jz      0b
        xor     %ah,%ah
        int     $0x16
        cmp     $'q',%al
        je      2f
        mov     $0x0e,%ah
        int     $0x10
        jmp     0b
2:
        cli
        hlt

The test is written in the floppy disk boot sector. echo_v2.img.zip

At the end of the day, it works on my computer (this is v1 version of the program without stop key q): signal-2022-06-20-193857_001

I wonder what can go different in ELKS BIOS keyboard driver?

Vutshi avatar Jun 20 '22 19:06 Vutshi

The program interrogates the keyboard in a loop and prints what was typed in.

As you have probably seen, ELKS uses an identical method to poll for and read characters. So, it seems, the BIOS call itself is not the problem.

It just occurred to me that perhaps the problem is your system real time clock (RTC) isn't firing interrupts, or that interrupts in general aren't working. The BIOS keyboard driver uses the RTC to poll the keyboard using the code above, but only polls every 8ms rather than continually. If the RTC isn't working, the polling won't happen.

Can you tell us a bit more about your RTC hardware? It should normally be set up by the BIOS, but ELKS takes over and programs the device itself. The supported RTC is an 8254 chip, with addressed at the following in elks/include/arch/ports.h:

/* timer, timer-8254.c*/
#define TIMER_CMDS_PORT 0x43            /* command port */
#define TIMER_DATA_PORT 0x40            /* data port    */
#define TIMER_IRQ       0               /* can't change*/

The timer code is in elks/arch/i186/kernel/timer-8254.c:

#define TIMER_MODE0 0x30   /* timer 0, binary count, mode 0, lsb/msb */
#define TIMER_MODE2 0x34   /* timer 0, binary count, mode 2, lsb/msb */

#define TIMER_LO_BYTE (__u8)(((5+(11931818L/(HZ)))/10)%256)
#define TIMER_HI_BYTE (__u8)(((5+(11931818L/(HZ)))/10)/256)

void enable_timer_tick(void)
{
    /* set the clock frequency */
    outb (TIMER_MODE2, TIMER_CMDS_PORT);

    outb (TIMER_LO_BYTE, TIMER_DATA_PORT);  /* LSB */
    outb (TIMER_HI_BYTE, TIMER_DATA_PORT);  /* MSB */
}

The keyboard polling code is in elks/arch/i86/drivers/char/kbd-poll.c:

static void kbd_timer(int data)
{
    int dav, extra = 0;

    printk("kbd poll\n"); // <- add this line
    if ((dav = conio_poll())) {
        printk("kbd_poll got %x\n", dav); // <-- add this line
        if (dav & 0xFF)
            Console_conin(dav & 0x7F);
        else {
        ...

Add the above two lines and see whether "kbd_poll" loops on your keyboard. It should then say "kbd_poll got xxx" when a character is typed.

ghaerr avatar Jun 20 '22 22:06 ghaerr

According to documentation МК-88 (my computer) has КР580ВИ53 chip which is equivalent to 8253 chip. Does it make a difference?

EDIT: Apparently, READ BACK command is missing in 8253, whatever it means.

Vutshi avatar Jun 21 '22 07:06 Vutshi

We were using our own clones called Pravetz 16 or IZOT. And they were first reversed engineered and then with some improvements.

toncho11 avatar Jun 21 '22 12:06 toncho11

Apparently, READ BACK command is missing in 8253, whatever it means.

I'll have to check. Could you try running the test with the two lines inserted as described above? That'll tell us more about whether the RTC is the culprit here.

ghaerr avatar Jun 21 '22 13:06 ghaerr

Apparently, READ BACK command is missing in 8253, whatever it means.

I'll have to check. Could you try running the test with the two lines inserted as described above? That'll tell us more about whether the RTC is the culprit here.

I'll do it in a few hours.

Vutshi avatar Jun 21 '22 13:06 Vutshi

@ghaerr,

Could you try running the test with the two lines inserted as described above?

It gives me something new — computer freezes: signal-2022-06-21-185758_001

Maybe it didn't respond to the keyboard in the first place because it was stuck. Anyway, now it happens before booting is complete.

Vutshi avatar Jun 21 '22 17:06 Vutshi

PC-98 also uses 8253.

tyama501 avatar Jun 21 '22 17:06 tyama501

Maybe it didn't respond to the keyboard in the first place because it was stuck. Anyway, now it happens before booting is complete.

Definitely strange. It seems we are getting a timer tick though. Perhaps comment out the first "kbd poll" printk, and leave 2nd one in, to see whether the kernel completes booting. I can't see why the boot would not complete with this in there. I'm wondering if we are having other issues with the amount of usable RAM?

PC-98 also uses 8253.

The PC-98 uses a different clock frequency, which @tyama501 is using to set the countdown register. However, we're not entirely sure that this is the reason for the problem yet. What is the clock frequency of your PC, do you know?

ghaerr avatar Jun 21 '22 18:06 ghaerr

Something else which seems strange... is it just me or does the computer always seem to stop working when the ELKS cursor gets to the bottom line?

Perhaps change the first printk to printk(".");

ghaerr avatar Jun 21 '22 18:06 ghaerr

Also notice in first printk, and also the second-to-last line in screenshot above: the "kbd poll" is missing the first letter: it says "bd poll". The TTY output drops the first letter, twice. This indicates something is quite amiss, it seems. I am beginning to wonder if the BIOS is trashing something or is incompatible with ELKS, for some reason.

Can you tell us more about your system? What is the programmable interrupt controller (PIC)? Is it an 8259? Are there other devices attached to it?

What is the BIOS, do you have a listing?

ghaerr avatar Jun 21 '22 19:06 ghaerr

Perhaps change the first printk to printk(".");

Did this and it helped a little bit. Now I can boot but only sometimes. The system is very unstable, it sends me bioshd(0) messages seemingly randomly and floppy drive seems to be very busy.

Here is an example of "successful" boot meaning we can see # but pressing keys doesn't provide the expected feedback. In the end it hangs: signal-2022-06-21-204925_001 Btw, here I switched to minix and completely turned off FAT support which gave me an amazing 166K of free RAM:)

This is another run which didn't reach #: signal-2022-06-21-215258

What are these bioshd(0) messages? I saw them even on my Intel Core 2 Duo running this test build of ELKS: signal-2022-06-21-210910_001

Vutshi avatar Jun 21 '22 19:06 Vutshi

What is the clock frequency of your PC, do you know?

CPU is 4.77MHz The analog of i8253 used in this computer has maximum frequency of 2MHz.

What is the programmable interrupt controller (PIC)? Is it an 8259? Are there other devices attached to it?

I can check it tomorrow.

What is the BIOS, do you have a listing?

What is "listing"? I know that it is 8KB and I have the ROM data.

EDIT:

Something else which seems strange... is it just me or does the computer always seem to stop working when the ELKS cursor gets to the bottom line?

I would say the computer stops working at random times

Vutshi avatar Jun 21 '22 19:06 Vutshi

What are these bioshd(0) messages? I saw them even on my Intel Core 2 Duo running this test build of ELKS:

These are BIOS floppy read retry messages: CHS 12/0/3 count 2 means read of cylinder 12, head 0, sectors 3&4 failed, and ELKS issued a retry. You're probably not holding your finger on the drive hard enough ;)

What is "listing"?

ASM source code.

When did the screen start displaying "kbd_poll got 2267"? This means that a keyboard character was received! Were you typing at the time?

The lower 8 bits of the "got xxxx" message indicate the hex value of the keyboard input received. From the screenshot, I see 79, 67, 68, 6b, 66, 67... this looks like a garbage ASCII sequence. Do the hex values seen mean anything to you, could they be unicode or scan codes?

ghaerr avatar Jun 21 '22 20:06 ghaerr

When did the screen start displaying "kbd_poll got 2267"? This means that a keyboard character was received! Were you typing at the time?

Yes, I was typing. However, this was my standard desktop on Intel (!!!)

Vutshi avatar Jun 21 '22 20:06 Vutshi

Here is what the screen should look like. Make sure you "make clean; make". Attached is also the config file I'm using. Screen Shot 2022-06-21 at 2 20 06 PM config.small.zip

ghaerr avatar Jun 21 '22 20:06 ghaerr

Yes, I was typing. However, this was my standard desktop on Intel (!!!)

Well, if you were typing garbage like "xyowxy", then that is correct. You can look up the values in an ASCII Chart.

ghaerr avatar Jun 21 '22 20:06 ghaerr

Here is what the screen should look like. Make sure you "make clean; make". Attached is also the config file I'm using.

Yes. This is how it looks like for me as well. The screen shot with "kbd poll got 2267" was done on Intel with the first version of the test. I included it just because of the bioshd(0) messages. Sorry, if I cause a confusion.

Vutshi avatar Jun 21 '22 20:06 Vutshi

It seems we're getting multiple issues mixed up here. I suggest testing first on your desktop, and seeing if ELKS runs well or not. We know ELKS works well, so lets see whether your compilation of it works on your desktop (remove printk's discussed above). Your desktop has some sector retry issues with ELKS our the floppy you're using.

Then, you can add back in the printk's, and see what proper desktop looks like. As I mentioned, it is proper to display the hex values I described, depending on what you're typing.

Once both the above work, we can get back to debugging other system.

ghaerr avatar Jun 21 '22 20:06 ghaerr

It seems we're getting multiple issues mixed up here. I suggest testing first on your desktop, and seeing if ELKS runs well or not. We know ELKS works well, so lets see whether your compilation of it works on your desktop (remove printk's discussed above). Your desktop has some sector retry issues with ELKS our the floppy you're using.

This one I checked already. Without printk on keyboard polling desktop works well with this configuration of ELKS. It is just as good as qemu.

Tomorrow I check that it returns correct hex values.

Vutshi avatar Jun 21 '22 20:06 Vutshi

You're probably not holding your finger on the drive hard enough ;)

Btw, I have got now a new floppy drive. No more fingers required to boot ELKS or DOS :)

Vutshi avatar Jun 21 '22 20:06 Vutshi

ASM source code.

Nope. No ASM source code for МК-88 BIOS

Vutshi avatar Jun 21 '22 20:06 Vutshi

Then, you can add back in the printk's, and see what proper desktop looks like. As I mentioned, it is proper to display the hex values I described, depending on what you're typing.

Here is the Intel desktop reference reaction on typing "qwerty" with my test build of ELKS: intel_reference

which is identical to the result in qemu: qemu_reference

Vutshi avatar Jun 22 '22 08:06 Vutshi

I wonder why the size of printk message (printk("."); vs printk("kbd poll\n");) affects the booting behaviour on my 8088 computer. Is there overfilling of some buffer?

Vutshi avatar Jun 22 '22 08:06 Vutshi

Can you tell us more about your system? What is the programmable interrupt controller (PIC)? Is it an 8259? Are there other devices attached to it?

Yes, it is an analog of 8259. There are not so many devices in the computer. LPT and Joystick come to mind, no HDD, no COM port, no nothing. I have updated the opening post with all information about the computer I have collected so far.

EDIT: One peculiar thing is that my МК-88 has a controller for cassette recorder which makes it different from IBM PC XT. There was one in IBM PCjr. "BIOS interrupt call 15[h] routines were documented in the technical reference manual that would turn the cassette motor on and off, and read or write data."

Vutshi avatar Jun 22 '22 09:06 Vutshi

Hi @ghaerr and everyone,

I was poking around with some more low-level tests and found one strange thing about my computer. In particular, after seeing random behaviour of ELKS with additional printk() messages we decided to try printing various messages from a test program loaded from the boot sector thus bypassing any OS.

It appears that the computer (BIOS?) doesn't like characters \r \n. I can print them only ~20 times and then computer freezes seemingly when it has to scroll the screen. Without the 'bad' characters printing goes on forever. I found a potentially similar bug described in IBM PC BIOS versions from 1981. Btw, as I mentioned above, MS-DOS doesn't care about this \r \n problem and works alright.

I did the following change to elks/blob/master/elks/kernel/printk.c#L62-L69

void kputchar(int ch)
{
  if (ch == '\r' || ch == '\n')
    return;
  if (kputc)
    (*kputc)(dev_console, ch);
  else early_putchar(ch);
}

It seems like removal of the \r \n characters help to restore the stability of ELKS booting. Now it works well, types dots like a champ until it needs to scroll... and it still refuses to read keyboard :)

https://user-images.githubusercontent.com/4971779/176764671-f2964776-a606-40b8-a9d6-cac9c720a59a.mp4

I noticed also that my computer can tolerate the unlucky characters \r \n while working in a graphical regime. Is there a way to boot ELKS in such a regime?

Vutshi avatar Jun 30 '22 20:06 Vutshi

Hello @Vutshi,

It appears then we have two big problems we need to work around for your system - the first is that the PC nearly crashes or becomes unresponsive when having to scroll, and secondly, the ongoing keyboard read problem.

For the first problem, it seems your BIOS has a buggy scroll routine? The ELKS boot block and early kernel setup use the INT 10h AH=0Eh to write to the console. In addition, the BIOS console uses this interrupt for console output.

You will probably have to rewrite this routine to do console output directly yourself (this is the same method that @tyama501 uses for PC-98). This means switching back to using the Direct Console (not BIOS console) for the time being, which will remain problematic because it uses a different method for keyboard I/O. Then, a rewritten console output routine would allow you to bypass using the buggy BIOS routine.

By switching back to Direct Console (CONFIG_CONSOLE_DIRECT=y), this problem may go away. Otherwise, it is possible to rewrite the console output routine and still use the BIOS console, as it may be more suited for fixing the keyboard, when we finally figure that out.

I would advise looking at the PC-98 ASM code for the rewritten console output routine in elks/arch/i86/drivers/char/conio-pc98-asm.S. It is a little complicated explaining exactly how we might get all this working, so I'll defer on that explanation until seeing which way you'd like to go.

This problem did not exist using Direct Console, correct?

Btw, as I mentioned above, MS-DOS doesn't care about this \r \n problem and works alright.

This is likely because MSDOS doesn't use the INT 10h function to display output. ELKS Direct Console doesn't either.

Thank you!

ghaerr avatar Jun 30 '22 20:06 ghaerr