elks Buffer/fs bug

In short: Removing (rm) a 'large' (as in several k bytes) cause the calling process to hang (cannot be interrupted). Shortly after, the system will hang, some times 'hard' (as in 'ctl-alt-del' not working).

This does NOT happen when XMS support is compiled in, whether enabled via /bootopts or not. Presumably that's the reason we haven't seen this before.

elks17# cd /
elks17# ls
EXEC: '/bin/ls' env 134
OPEN '.' flags 0x0 = 4
CLOSE 4
bin          bootopts     dev          etc          fd           linux        
linux.buf    linux.nobuf  mnt          mnt2         mnt3         mnt4         
nlinux       root         tmp          xx           zz           
elks17# touch x
BLK 11 read fc002180:3
EXEC: '/bin/touch' env 141
BLK 808 read b0002180:3
OPEN 'x' flags 0x241 = 4
CLOSE 4
elks17# rm x
EXEC: '/bin/rm' env 138
BLK 1325 read 6c002180:3
BLK 1326 read 2180:3
BLK 1327 read 40002180:3
BLK 1328 read 48002180:3
BLK 1329 read f0002180:3
BLK 1330 read 4c002180:3
UNLINK 'x'
elks17# ls -l zz
EXEC: '/bin/ls' env 144
OPEN '/etc/passwd' flags 0x0 = 4
BLK 1429 read e4002180:3
CLOSE 4
OPEN '/etc/group' flags 0x0 = 4
BLK 1422 read 50002180:3
CLOSE 4
-rw-r--r--  1 root     root        81472  Jul 12 10:46 zz
elks17# rm zz
EXEC: '/bin/rm' env 139
UNLINK 'zz'                   <---------- hang

Jul 12 '22 08:07 Mellvik

Hello @Mellvik,

I saw this one years ago, and its on my list. The reason for the hang is that the MINIX filesystem in some cases requires more local L1 locked buffers than there are available (8 unless FAT linked in, then 12). The rm process hangs in the kernel waiting for another buffer to become free, and rm already has all the buffers locked, so deadlock.

It is somewhat hard to believe that the kernel needs more than 12K buffers to remove a file, but this happens when the inode is using a double-indirect block list, I think. We're very short on kernel data segment space, so merely increasing the L1 size isn't a good option.

This does NOT happen when XMS support is compiled in,

Interesting. Perhaps its an L2/L1 issue then, not just L1.

I'll continue to look into this.

Thank you!

Jul 12 '22 17:07 ghaerr

Thank you @ghaerr.

It's strange I haven't run into this one before. After all, the XMS support is a fairly recent addition. Without XMS ELKS hangs every time I (say) 'rm linux'. Then again I may have a setting (this is new one, new machine etc) that just happen to make the bits in the puzzle fit exactly.

—M

jul. 2022 kl. 19:05 skrev Gregory Haerr @.***>:

Hello @Mellvik https://github.com/Mellvik,

I saw this one years ago, and its on my list. The reason for the hang is that the MINIX filesystem in some cases requires more local L1 locked buffers than there are available (8 unless FAT linked in, then 12). The rm process hangs in the kernel waiting for another buffer to become free, and rm already has all the buffers locked, so deadlock.

It is somewhat hard to believe that the kernel needs more than 12K buffers to remove a file, but this happens when the inode is using a double-indirect block list, I think. We're very short on kernel data segment space, so merely increasing the L1 size isn't a good option.

This does NOT happen when XMS support is compiled in,

Interesting. Perhaps its an L2/L1 issue then, not just L1.

I'll continue to look into this.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1182020992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOENET6UELJKTD7W6FLVTWQWLANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Jul 12 '22 17:07 Mellvik

Without XMS ELKS hangs every time I (say) 'rm linux'.

Oh, good to know we have a larger file available to delete. I'll use this for a test case and will up the priority.

After all, the XMS support is a fairly recent addition

Yes, I got a bit spoiled myself with a default .config with XMS and 2880k disk support. I'm back to running 1440k and no XMS, so I'll see whether this repeats over here.

Jul 12 '22 17:07 ghaerr

Yes, I got a bit spoiled myself with a default .config with XMS and 2880k disk support. I'm back to running 1440k and no XMS, so I'll see whether this repeats over here.

Welcome to the club :-) . Well, I guess you're on QEMU so it's fast anyway… I got really spoiled recently when I started to run of of HD instead of floppy. He.., it's fast.

Crazy as it sounds, 'rm linux' is probably the command I use the most these days, building and transferring 50-100 kernels per day. :-)

BTW - I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.

-M

Jul 12 '22 17:07 Mellvik

Without XMS ELKS hangs every time I (say) 'rm linux'. Oh, good to know we have a larger file available to delete. I'll use this for a test case and will up the priority.

I just tested with our "standard" configuration ibmpc-1440.config (used for the auto Github builds), and rm linux works great, no hang. Can you try using that configuration, or diffing it with your .config, to see what might be happening? It would be worthwhile to see whether QEMU hangs for you as well, perhaps this problem is not the same as I described.

I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.

Definitely interested to hear more about that.

Jul 12 '22 17:07 ghaerr

I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.

Definitely interested to hear more about that.

OK, this mystery has been resolved if not fixed.

It turns out (news to me, not to you) that sys assumes the source fs is an image created by the build_hd_image.sh script so that the partition boot block (aka floppy boot block) is present at a predictable place. Which is what happened on my newer system. But the old system wasn't created that way, it evolved over time, all 3 Minix FSs being created manually, sys used only recently. IOW there never was a valid boot block to copy.

What I suggest, in order to make this more failsafe, is that either minix mkfs or sys copiy the bootblock explicitly (from file).

-M

Jul 13 '22 13:07 Mellvik

sys assumes the source fs is an image created by the build_hd_image.sh script

Wow - I've never seen that script before. It is extremely outdated and should probably be removed, as ELKS now creates all HD images (hd32mbr-minix.img and hd32-minix.img) using Makefiles under the images/ directory, and doesn't require loopback filesystems to build. Of course, these "standard" HD images are really only testing, as few people have a 32MB HD!

so that the partition boot block (aka floppy boot block) is present at a predictable place. IOW there never was a valid boot block to copy.

Which boot block exactly are you talking about here - the MBR, or the boot block which is the first sector of every partition? I don't quite yet understand why the Compaq is failing on sys.

What I suggest, in order to make this more failsafe, is that either minix mkfs or sys copiy the bootblock explicitly (from file).

I've thought about adding another option that allows copying a boot sector as a seperate file, but sys (or makeboot) doesn't have that file available to it on a standard boot FD or HD system. It could be possible to compile-in a boot sector into makeboot. Once I fully understand what is happening on your Compaq, we can work to come up with a better solution.

Jul 13 '22 14:07 ghaerr

Back to the rm hang problem:

This is repeatable only on one particular machine, so I'm going after it in that setting. I was hoping it had to do with the memcmp problem just reported, it wasn't.

-M

Jul 14 '22 11:07 Mellvik

Back to the rm hang problem This is repeatable only on one particular machine, so I'm going after it in that setting

If this only repeats with the same compiled kernel on a different machine, then that's very strange - suspect possible memory problem? Does that machine have different amount of main memory (assume no XMS)?

Otherwise, I suspect may be able to reproduce with different CONFIG_FS_NR_EXT_BUFFERS= setting, as I have seen this problem before. Also possibly try recompiling with standard ibmpc-1440.config file for failing machine.

Jul 14 '22 15:07 ghaerr

Thank you @ghaerr -

Back to the rm hang problem This is repeatable only on one particular machine, so I'm going after it in that setting

If this only repeats with the same compiled kernel on a different machine, then that's very strange - suspect possible memory problem? Does that machine have different amount of main memory (assume no XMS)?

Yes, it still happens only if XMS is not compiled in. And yes, the main memory is odd sized, 607K as reported by ELKS. The machine has 4M total, and the reason for the odd main memory size is the DiskOnChip, which is an interesting story by itself, for another time. I can actually remove the chip and increase main memory to 637k - I'll do that just for the heck of it.

Otherwise, I suspect may be able to reproduce with different CONFIG_FS_NR_EXT_BUFFERS= setting, as I have seen this problem before. Also possibly try recompiling with standard ibmpc-1440.config file for failing machine.

Thanks, will check out that too! It would be good to figure this one out now that I have a repeatable setting.

—M

Jul 14 '22 15:07 Mellvik

Update: This turns out to be a touch cookie indeed.

Given that the problem appears on this particular hardware only, not on QEMU, not on the compaqs, it would be reasonable to guess 'race condition'. However, since the problem goes (reliably, predictably) away when adding the right debug printks in the code, that seems unlikely. Also, if those printks don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printks can be in any of a number of routines and will have slightly different effects but always hang the process or the system. And frequently destroy the root file system, making it a very slow process.

More later.

-M

Jul 22 '22 09:07 Mellvik

Also, if those printks don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printks can be in any of a number of routines and will have slightly different effects but always hang the process or the system.

Agreed. Now that you describe this new behavior, I have seen this before, and the kernel stack was being corrupted due to some strange behavior with the LED lights and keyboard controller code. I can't remember the exact issue, I'll look it back up, but it was finally identified and fixed by @tkchia. Does this particular system have anything unusual about the keyboard or its controller? Does the issue occur when no Caps Lock or other LED keys have ever been pressed?

Jul 22 '22 13:07 ghaerr

Also, if those printks don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printks can be in any of a number of routines and will have slightly different effects but always hang the process or the system.

Agreed. Now that you describe this new behavior, I have seen this before, and the kernel stack was being corrupted due to some strange behavior with the LED lights and keyboard controller code. I can't remember the exact issue, I'll look it back up, but it was finally identified and fixed by @tkchia https://github.com/tkchia. Does this particular system have anything unusual about the keyboard or its controller? Does the issue occur when no Caps Lock or other LED keys have ever been pressed?

Yes, the keyboard is hardly ever touched (not quite true since it comes to rescue every time the FS has been corrupted, but still it's rare). I ran into an interesting case this morning in which one printk is just ignored, the next one - 2 lines down, prints. This is in minix/truncate.c which is one of the suspects.

Are any of the 'stack protection' options to gcc included in our toolchain btw? And what about the register declarations, are the ignored or actually used? I noticed gcc complains if you try to get the address of a register variable, which makes sense but may not be a reliable signal...

-M

Jul 22 '22 14:07 Mellvik

Reviewing this issue a bit, I am now guessing that the kernel corruption, now more firmly evidenced by printk's not working (which occurred on the last similar issue), is ultimately happening to scribble on EXT memory buffers, which is causing the root file system corruption. It doesn't occur on XMS likely because the memory buffers are in non-addressable XMS memory.

Although not a for sure, we likely need to continue to look at the potential hardware incompatibility on your one system as the base cause, since none of this is (yet) repeatable on any other system. Thus the keyboard controller mention, which was the cause of the last similar problem. Any other dissimilarities between the failing system and your other PCs would be good to know, including whether the problem persists without any NIC or serial cards enabled/inserted (as well as possibly memory configuration changes).

Are any of the 'stack protection' options to gcc included in our toolchain btw?

Not really, I think there is a stack check option that emits a call to a stack checker that could be turned on, but ELKS itself checks the kernel stack pointer on every clock interrupt and system call entry, as well as checking the SS register for validity.

And what about the register declarations, are the ignored or actually used? I noticed gcc complains if you try to get the address of a register variable

I'm pretty sure register declarations are ignored with regards to forcing any register usage, but they're not entirely ignored as I've also seen the address of register variable error. I doubt we're dealing with a compiler issue. I will research the last issue a bit more and report back.

Jul 22 '22 15:07 ghaerr

Thank you @ghaerr -

Although not a for sure, we likely need to continue to look at the potential hardware incompatibility on your one system as the base cause, since none of this is (yet) repeatable on any other system. Thus the keyboard controller mention, which was the cause of the last similar problem. Any other dissimilarities between the failing system and your other PCs would be good to know, including whether the problem persists without any NIC or serial cards enabled/inserted (as well as possibly memory configuration changes).

Agreed, that may well be the case. The ALI 6117C is a 386sx based SOC with most components on chip - interrupt controllers, timer/counter, DMA controllers, memory mapper, keyboard/mouse i/f (!) and much more. As mentioned in https://github.com/jbruchon/elks/issues/1303#issuecomment-1192324726 https://github.com/jbruchon/elks/issues/1303#issuecomment-1192324726 I've had IRQ5 problems with the system. Also mentioned before (this thread) is the main memory mapping, creating some odd main memory sizes (such as 607K available). This has been tested for the case in question and found to not influence the situation.

I doubt we're dealing with a compiler issue. I will research the last issue a bit more and report back.

Agreed - and thank you!

—M

Jul 22 '22 16:07 Mellvik

Hello @Mellvik,

I have reviewed the previous issues relating to the ELKS kernel crashing, or acting strangely, and haven't yet been able to correlate to this problem. In order to eliminate the potential IRQ 1/keyboard controller as a culprit, try configuring for BIOS console instead of Direct console for this machine. The BIOS console doesn't interact directly with IRQ 1 nor the LEDs.

Will the sequence on your initial post always cause this issue? If so, that's good: the commands could be put in a shell script and executed immediately after boot, to more rapidly eliminate variables. I would suggest that this be done without an serial or NIC cards in the system as well.

Thank you!

Jul 25 '22 17:07 ghaerr

Thank you @ghaerr.

I have reviewed the previous issues relating to the ELKS kernel crashing, or acting strangely, and haven't yet been able to correlate to this problem. In order to eliminate the potential IRQ 1/keyboard controller as a culprit, try configuring for BIOS console instead of Direct console for this machine. The BIOS console doesn't interact directly with IRQ 1 nor the LEDs.

Yes, I will check that one in the morning. Agreed, eliminating possible cases is the only way forward.

Will the sequence on your initial post always cause this issue? If so, that's good: the commands could be put in a shell script and executed immediately after boot, to more rapidly eliminate variables. I would suggest that this be done without an serial or NIC cards in the system as well.

I've been making some progress this past week. The problem is entirely repeatable, slight changes in the code change behaviour. I've got it to fail on the compaq, not with rm but mount, and mount also fails (now) on the SX. So it's not just unlink only anymore. Also, open with truncate will fail like rm, so truncate is still in the crosshairs. Moving a delay loop around in the code has turned out to be useful - and a race condition is still a possibility, although I have a hard time seeing how any of the events I'm looking at can be interrupted. If they were, my printk's would show it.

To be continued...

-M

Jul 25 '22 18:07 Mellvik

The actual issue itself seems to be evolving... we know there's a kernel problem where the system can appear to hang due to not enough buffers. This manifests itself by sleeping the process in a deadlock, but the system itself is fine. That is, logging in from another terminal is possible.

With regards to this only happening on one system, you're now saying it's happening on two others? I would assume that this is likely because the problem wasn't iterated enough times on the newer systems, but shows up once in a while? And that the original system (with a different memory configuration) always hangs immediately?

Then there's the additional reported issue of kernel printk's not working... not sure how that fits in with the rest of the reported problems. Do printk's always not work, or only on the original problem system?

It appears the unlink system call calls the kernel rescheduler (possibly to allow other processes to release buffers). This means that the answer to your comment above about how events can be interrupted is that, if any other process is running (particularly ktcp, as it has select timeouts), the process can definitely get interrupted and switched out. Thus, all testing should be done on a bare bones system with as few other processes running as possible (suggest single-user mode, or init=/bin/sh in /bootopts).

I'm looking a bit at unlink versus ftruncate, and they're a bit complicated. So it would be nice to further narrow down all the reported issues before chasing too deep. You can answer these questions in your next report.

Thank you!

Jul 25 '22 19:07 ghaerr

The actual issue itself seems to be evolving...

It is - with the addition and movement of printk's.

we know there's a kernel problem where the system can appear to hang due to not enough buffers. This manifests itself by sleeping the process in a deadlock, but the system itself is fine. That is, logging in from another terminal is possible.

I don't think this is what we're looking at since the problem is immediate after boot, so plenty of buffers. The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd. As to the type of hang, it varies - from silent hard hang via kernel messages about kernel stack corruption to process hangs and everything else working (and other variants) depending on the current printk's. Classic stack corruption indicators.

With regards to this only happening on one system, you're now saying it's happening on two others?

Just one other - the 3rd is in hiatus to leave room for what I now call the ALI system, which is a single board computer the size of a NIC .

I would assume that this is likely because the problem wasn't iterated enough times on the newer systems, but shows up once in a while?

I'm testing a kernel on the compaq only occasionally,

And that the original system (with a different memory configuration) always hangs immediately?

With the printk settings I have now, both fail immediately (if differently) given the right commands.

Then there's the additional reported issue of kernel printk's not working... not sure how that fits in with the rest of the reported problems. Do printk's always not work, or only on the original problem system?

That's 106 builds ago, but can be reproduced any time.

It appears the unlink system call calls the kernel rescheduler (possibly to allow other processes to release buffers). This means that the answer to your comment above about how events can be interrupted is that, if any other process is running (particularly ktcp, as it has select timeouts), the process can definitely get interrupted and switched out. Thus, all testing should be done on a bare bones system with as few other processes running as possible (suggest single-user mode, or init=/bin/sh in /bootopts).

True, but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.

More later.

M

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Jul 25 '22 20:07 Mellvik

The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd

Three mounts is a lot of activity - introduces variables in perhaps FAT and MINIX filesystems. Recommend leaving all auto-mounts off for presentation of bug reproduction. If this only reproduces when FAT mounts are made, for instance, we could be getting kernel stack overflow... we saw this before on @tyama501's machine where BIOS disk I/O used tons of stack, and increasing KSTACK_BYTES by 100 in <limits.h> fixed the problem. You can also turn on CONFIG_STRACE on the failing system and it will display every system call and its associated kernel stack used.

but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.

Yes - my comment was from looking at the kernel source for unlink: it explicitly calls schedule, which will reschedule the current process immediately:

int minix_unlink(register struct inode *dir, char *name, size_t len)
{   
    int retval;
    register struct inode *inode;
    struct buffer_head *bh;
    struct minix_dir_entry *de;
    
    goto init_loop;
    do {
        iput(inode);
        unmap_brelse(bh);
        schedule();  <--- not sure why, but here it is
  init_loop: 
        retval = -ENOENT;
        inode = NULL;
        bh = minix_find_entry(dir, name, len, &de);
        if (!bh) goto end_unlink2;
        if (!(inode = iget(dir->i_sb, (ino_t) de->inode))) goto end_unlink1;
        retval = -EPERM;
        if (S_ISDIR(inode->i_mode)) goto end_unlink;
    } while (de->inode != inode->i_ino);

Just trying to help get your problem debugged. We need a reproduction with the barest boot conditions possible, no extra processes running, no auto-mounts, no extra cards inserted, if possible. If we can reproduce under those conditions, it will be a lot faster when I have to go searching the kernel code for problems...

I suppose the good news is that you're finding this on other systems, which means we should hopefully get a reproduction on QEMU at some point, which will help me debug it, since I have little access to real hardware.

Thank you!

Jul 25 '22 20:07 ghaerr

Thank you @ghaerr, Being forced to describe the issues and observations is helpful in itself, and discussing them even more so. This rabbit hole turned out to be deep (again) - and no rabbit ...

The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd

Three mounts is a lot of activity - introduces variables in perhaps FAT and MINIX filesystems.

Agreed. AAMF I was elated to see the problem on a second machine, so I kept the setup while verifying that mount triggers the problem on the first system too - in certain circumstances.

Recommend leaving all auto-mounts off for presentation of bug reproduction. If this only reproduces when FAT mounts are made, for instance, we could be getting kernel stack overflow... we saw this before on @tyama501's machine where BIOS disk I/O used tons of stack, and increasing KSTACK_BYTES by 100 in <limits.h> fixed the problem.

That's one I haven't tried yet, will check that.

You can also turn on CONFIG_STRACE on the failing system and it will display every system call and its associated kernel stack used.

Good idea, I've never used this one - being sceptical about its effect on timing and performance. No is the time to change that.

but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.

Yes - my comment was from looking at the kernel source for unlink: it explicitly calls schedule, which will reschedule the current process immediately:

int minix_unlink(register struct inode *dir, char *name, size_t len) {
int retval; register struct inode *inode; struct buffer_head *bh; struct minix_dir_entry *de;
goto init_loop;
do {
    iput(inode);
    unmap_brelse(bh);
    schedule();  <--- not sure why, but here it is
init_loop: retval = -ENOENT; inode = NULL; bh = minix_find_entry(dir, name, len, &de); if (!bh) goto end_unlink2; if (!(inode = iget(dir->i_sb, (ino_t) de->inode))) goto end_unlink1; retval = -EPERM; if (S_ISDIR(inode->i_mode)) goto end_unlink; } while (de->inode != inode->i_ino); The thing is - the goto before the loop (this code is full of them, seems like a Torvalds speciality) ensures that schedule() is skipped in 'normal' cases. I believe the schedule call is there because another process may be waiting for the buffer just released (2nd pass or later). Just trying to help get your problem debugged. We need a reproduction with the barest boot conditions possible, no extra processes running, no auto-mounts, no extra cards inserted, if possible. If we can reproduce under those conditions, it will be a lot faster when I have to go searching the kernel code for problems...

Agreed and all feedback/dialog is appreciated. I've been using QEMU for comparing behaviour and in tests for reproducibility. No luck in that dept so far, not even when taking the QEMU performance setting down to 10 or 1%.

I suppose the good news is that you're finding this on other systems, which means we should hopefully get a reproduction on QEMU at some point, which will help me debug it, since I have little access to real hardware.

It's indeed good. Further, the fact that I'm seeing it in situations other than truncate() is good or bad depending on perspective. In any case its part of the digging. The rabbit is still in the wind.

-M

Jul 26 '22 06:07 Mellvik

I haven't tried personally running ELKS in pcem or 86box, but they both provide more accurate cpu timing or even hdd timing (in 86box git) emulation ..

Jul 26 '22 22:07 Randrianasulu

Thank you Andrew,

I didn't realize 86box can run on MacOs. It does and it even has a setting for emulating the ALI processor I'm using.

Very encouraging.

—M

jul. 2022 kl. 00:47 skrev Andrew Randrianasulu @.***>:

I haven't tried personally running ELKS in pcem or 86box, but they both provide more accurate cpu timing or even hdd timing (in 86box git) emulation ..

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1196055227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOAXGTFICMVL7JB3MMDVWBTJPANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Jul 27 '22 08:07 Mellvik

@ghaerr, the buffer bug is still elusive and progress so far is more eliminating what isn't the problem than getting really closer to it. A couple of possible bugs but no cigar.

While chasing down the differences between cases that work (XMS buffers) and cases that don't (ext buffers), here is some output I was hoping could provide a clue. The first sample (ignore the extra stuff, it's from printks indicating where we're at) is the ext64 case which fails in an endless loop in truncate_direct (td0). The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw (in map_buffer, fs/buffer.c). They are very different between the two cases - which is expected, but is there anything that stands out to you? The file being unlinked is named 'pp', a copy of /bootopts .... < 500 bytes.

EXEC: '/bin/rm' env 139
REMAP: 14
REMAP: 19
REMAP: 13
BLK 1593 read 400023c0:3
BLK 1594 read 23c0:3
BLK 1597 read 600023c0:3
BLK 1598 read 580023c0:3
BLK 1599 read 3c0023c0:3
BLK 1600 read 5c0023c0:3
REMAP: 14
BLK 20 read 440023c0:3
map:   17 try 8
UNMAP: 25 <- 8:6400:23c0:8826:13a0
REMAP: 14
UNLINK 'pp'
REMAP: 14
trEtd0map:   5 try 9
UNMAP: 18 <- 9:4800:23c0:8c26:13a0
td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0 etc. forever.

The second case is with XMS buffers active and works fine:

EXEC: '/bin/rm' env 139
REMAP: 14
REMAP: 19
REMAP: 13
BLK 1593 read 100000:4000
BLK 1594 read 100000:0
BLK 1597 read 100000:6000
BLK 1598 read 100000:5800
BLK 1599 read 100000:3c00
BLK 1600 read 100000:5c00
REMAP: 14
BLK 20 read 100000:4400
map:   17 try 8
UNMAP: 25 <- 8:6400:0:10:8876
REMAP: 14
UNLINK 'pp'
REMAP: 14
trEtd0map:   5 try 9
UNMAP: 18 <- 9:4800:0:10:8c76
td0td0td0td0td0td0trDtrItrXmfi:mb;map:   2 try 10
UNMAP: 19 <- 10:4c00:0:10:9076

Any input would be welcome.

Thank you -M

Aug 01 '22 17:08 Mellvik

Hello @Mellvik,

Actually these two logs look remarkably similar - so much so that the block reads and buffer map/unmaps are identical, right until the moment where the system hangs (or rather, loops indefinitely in truncate, it seems).

I haven't had time to properly analyze MINIX truncate, to see 1) why it loops with a possible reschedule in the first place, and 2) how it might be hanging. However, in general ,the L1 buffer map/unmap is called (using map_buffer and unmap_buffer or unmap_brelse), to grab a disk buffer (in this case, looks like one of the disk mapping blocks for the inode being truncated), and then copies that external or XMS buffer into a local (DS-addressable) L1 buffer. The map_buffer routine itself, since we're limited in the kernel to only 12 or so of them) will sleep the process if there aren't enough free buffers, until unmap_buffer frees a buffer. So, in general, if any kernel process requires more than 12 buffers, and only a single process is using them, the system will hang.

I don't understand truncate well enough to see why it might need > 12 buffers: certainly there's buffers needed for each open directory in the namei directory filename processing routines, then another one for the inode itself, then probably one for each of the inode indirect block buffers, etc. But with a < 512 byte file, there's no indirect blocks, since MINIX keeps the first seven in the inode itself.

Also, there is no difference in the number of L1 buffers when running EXT vs XMS external buffers. It is possible, for some reason, that we have a kernel bug where an external buffer is not being released properly. You might test this be setting the EXT buffers to 128, instead of 64, and see whether that changes anything. I think ELKS will run with even more EXT buffers, for testing (all will be in main memory).

Finally, there may be a bug in truncate itself, where for some reason it loops thinking that it needs more L1 (mapped) buffers than it actually needs, and this is causing the busy loop.

The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw (in map_buffer, fs/buffer.c).

Instead of displaying the buffer addresses, suggest putting more printks in truncate, especially around the buffers it seems to want. The buffer block number can be printed using bh->b_blocknr, to better understand exactly which inode block truncate is looking at.

I assume there are NO other running processes on the system (no network, etc), right? This will matter with the rescheduling, which I discussed previously. If you have various filesystems mounted, it is possible that could be a problem also, as they could possibly be using an L1 buffer for each of them. It is entirely possible we are just running out of L1 buffers due to too many kernel resources being used (like active mounts, for instance). (If that's the case, and the problem goes away without the 4 mounted filesystems, then we can possibly correct that by having the filesystem re-grab the superblock for each, rather than keeping them mapped into L1). I haven't actually checked that they are locked in L1 space, just a guess.

I'll look more into this tomorrow.

Thank you!

Aug 02 '22 04:08 ghaerr

Thank you @ghaerr - it seems I need a new angle on this one - including clearing out xms_fmemcpyw as a potential suspect.

Actually these two logs look remarkably similar - so much so that the block reads and buffer map/unmaps are identical, right until the moment where the system hangs (or rather, loops indefinitely in truncate, it seems).

Yes, the elusiveness of this problem is frustrating. It still seems that the problem is stack pollution (see below) so the symptoms appear in other places than the problem.

I haven't had time to properly analyze MINIX truncate, to see 1) why it loops with a possible reschedule in the first place, and 2) how it might be hanging. However, in general ,the L1 buffer map/unmap is called (using map_buffer and unmap_buffer or unmap_brelse), to grab a disk buffer (in this case, looks like one of the disk mapping blocks for the inode being truncated), and then copies that external or XMS buffer into a local (DS-addressable) L1 buffer. The map_buffer routine itself, since we're limited in the kernel to only 12 or so of them) will sleep the process if there aren't enough free buffers, until unmap_buffer frees a buffer. So, in general, if any kernel process requires more than 12 buffers, and only a single process is using them, the system will hang.

I don't understand truncate well enough to see why it might need > 12 buffers: certainly there's buffers needed for each open directory in the namei directory filename processing routines, then another one for the inode itself, then probably one for each of the inode indirect block buffers, etc. But with a < 512 byte file, there's no indirect blocks, since MINIX keeps the first seven in the inode itself.

I've spent what feels like aeons in truncate by now and while it seems to be doing some strange things (and missing a cast to int in V1_trunc_direct, for the DIRECT_BLOCK macro, which I at one point figured was the problem), truncate is in the clear (btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now). The thing with truncate is that it turns out to be the place where the symptoms are easiest to trigger and reproduce, but its not the only place (mount is another, not as predictable, see below). V1_trunc_direct can fail at any point in the code depending on where the printks are placed. It can also be made to work reliably if the printks are place right - and some of the printks will be mysteriously skipped. BTW - while indirect and double indirect blocks are not involved (small file), they are always called anyway.

Also, there is no difference in the number of L1 buffers when running EXT vs XMS external buffers. It is possible, for some reason, that we have a kernel bug where an external buffer is not being released properly. You might test this be setting the EXT buffers to 128, instead of 64, and see whether that changes anything. I think ELKS will run with even more EXT buffers, for testing (all will be in main memory).

I will check that. I've been using 32,64 and 80 EXT buffers so far, no difference.

Finally, there may be a bug in truncate itself, where for some reason it loops thinking that it needs more L1 (mapped) buffers than it actually needs, and this is causing the busy loop.

The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw (in map_buffer, fs/buffer.c).

Instead of displaying the buffer addresses, suggest putting more printks in truncate, especially around the buffers it seems to want. The buffer block number can be printed using bh->b_blocknr, to better understand exactly which inode block truncate is looking at.

See above. Actually, buffer.c emits pretty much the same symptoms. Enabling the debug() statements in the code, and the problem goes away, then disabling/enabling individual statements change behaviour, even if the changes are made in parts of the code that isn't being executed.

I assume there are NO other running processes on the system (no network, etc), right? This will matter with the rescheduling, which I discussed previously. If you have various filesystems mounted, it is possible that could be a problem also, as they could possibly be using an L1 buffer for each of them. It is entirely possible we are just running out of L1 buffers due to too many kernel resources being used (like active mounts, for instance). (If that's the case, and the problem goes away without the 4 mounted filesystems, then we can possibly correct that by having the filesystem re-grab the superblock for each, rather than keeping them mapped into L1). I haven't actually checked that they are locked in L1 space, just a guess.

The system is idle and other processes don't affect the problem. I have networking running most of the time, stopping the network and all gettys/logins don't change anything. Neither do the mounts, but the mount operation will sometimes hang the system. I haven't been able to make this one entirely predictable.- At first it seemed like the 3rd mount would always hang, then it was a specific mount (i.e. a file system), but the path is an unreliable (and slow) source because the crashes frequently screw up either the file system or the bootblock or both. BTW I have an updated version of make boot that helps - checks for the magic number at the end of the source bootblock and copies only the bootblock if requested. I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes. Also, a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?

I keep looking for local (stack allocated) pointers begin accidentally used as variables (frequent casting make them hard to catch for the compiler) and long/int mixups.

Thanks for looking into this.

—M

Aug 02 '22 08:08 Mellvik

Hi @Mellvik,

I'm getting confused again on this issue, we might possibly be talking about different things - is the problem that rm hangs, or something else? My long response above is only to do with why a hang may be occurring, nothing else.

the mount operation will sometimes hang the system.

You're saying running mount hangs the system? Is this a separate problem, or to do with the issue of rm hanging?

Are all these duplicating regularly only on the one "bad" machine you have?

I understand you're saying that there is a separate problem relating to adding printk's that result in displaying garbage. When that happens, can you run that version on QEMU with the same result, or does this also only happen on the "bad" system?

a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?

Correct.

I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes.

From your log above - we are most definitely running out of L1 buffers, that is the reason truncate is looping. I explained that map_buffer will sleep the process, and some other process is issuing wakeup. This is the reason why we need to eliminate ALL other processes to track this down. (That is, if the problem is that of rm hanging, versus the other 3 problems mentioned in this issue).

Aug 02 '22 15:08 ghaerr

Thank you @ghaerr -

yes, there is plenty of room for confusion and I've been at this one for so long that a lot of the parameters are easily taken for granted. So a few things (my perspective):

There is only one problem - stack corruption
rm hang is a symptom, a consequence of the above. There seem to be no problems in truncate or unlink.
the printks demonstrate this: Their presence, content (# of parameters) and placement change the layout of the stack in (say) truncate and thus the behaviour.
this can be repeated in buffer.c too. Leaving truncate out of the equation (i.e. no debug printk's in truncate routines), instead adding DEBUG to buffer.c and rm works just fine. Until we remove a couple of the debug statements - and we have a new variant of symptoms.
IOW, the nature of rm's failure varies with the placement of the debug statements: Some times small files can be deleted, some times all files can be deleted, some times the process loops, some times the kernel hangs silently, some times it panics (say, in brelse) - or like this:

elks17# rm /linux; mv linux /

panic: Process -31913 exceeded kernel stack limit! magic c483

SYSTEM HALTED - Press CTRL-ALT-DEL to reboot:

this behaviour is on the SX (ALI) system only, and goes away when XMS buffers are used.
The mount problem has (AFAICT) the same origin: A result of the stack corruption occurring if the 'right' debug statements are in the 'right' places.
The mount problem is not affected by XMS buffers, it's always the same and reproducible on the Compaq too, but not on QEMU.
Like with rm, the nature of the mount problem changes with the amount of debug stuff active - and placement/params etc. Initially, mount would hang predictably when mounting (say) /dev/hda3 but not /dev/hda2 (both minix file systems). Some changes later it will not hang at all when mounting any file system, but hang when running the command without arguments to show available space - iff a FAT file systems is mounted.

At one point I was thinking the reason might be related to A20, memory mapping and the 1M wraparound 'mechanism' - since it seems to be somewhat hw/system dependent. But since ELKS is not using the HMA, that cannot be it. I did notice though, that while both the compaq and the SX, when running with XMS buffers, report 'A20 was off, now on' while QEMU reports 'A20 was on, now on'. A curiosity - and I haven't pulled out the 286 to get a 3rd platform to test on - yet.

So, having chased the problem through every statement the truncate routines and seen some really head scratching behaviour, I'm back to buffer.c and the buffer management routines - not really expecting to find the problem, but to find a pointer to it.

Thanks for clarifying the L1 mechanism. You're right - wait/schedule may indeed have to do with it.

BTW - truncate is looping only when its stack is corrupted. When working normally, V1_trunc_direct loops through all (7) blocks of the zone even when the file size is smaller than the first block, then exits. Room for improvement, but then again, unlink or truncate performance was never an issue.

—M

aug. 2022 kl. 17:02 skrev Gregory Haerr @.***>:

Hi @Mellvik https://github.com/Mellvik,

I'm getting confused again on this issue, we might possibly be talking about different things - is the problem that rm hangs, or something else? My long response above is only to do with why a hang may be occurring, nothing else.

the mount operation will sometimes hang the system.

You're saying running mount hangs the system? Is this a separate problem, or to do with the issue of rm hanging?

Are all these duplicating regularly only on the one "bad" machine you have?

I understand you're saying that there is a separate problem relating to adding printk's that result in displaying garbage. When that happens, can you run that version on QEMU with the same result, or does this also only happen on the "bad" system?

a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?

Correct.

I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes.

From your log above - we are most definitely running out of L1 buffers, that is the reason truncate is looping. I explained that map_buffer will sleep the process, and some other process is issuing wakeup. This is the reason why we need to eliminate ALL other processes to track this down. (That is, if the problem is that of rm hanging, versus the other 3 problems mentioned in this issue).

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1202764132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOGXSQN26BH3RXCSOHDVXEZ6RANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Aug 02 '22 16:08 Mellvik

So basically, all of these issues (except mount) occur only on the SX/ALI system, right?

If so, we need to consider BIOS or hardware issues (including possible I/O delays only seen on that computer). It is possible that the SX BIOS is using too much stack, or that the stack is being trashed, but that this is seen only under the following circumstances:

Because truncate will call reschedule, and that L1 buffers can cause a hang (remember, I can duplicate the rm hang issue on QEMU), then what possibly is happening is reschedule is trashing the stack somehow. The scenario we are discussing is one of the few cases coded in the kernel where a reschedule is coded, that may return directly to the calling application, rather than sleeping which is the result in almost all other reschedules. The reason for this on the SX system could be the speed of its I/O (rather than a BIOS stack issue), that just happens to cause a reschedule() bug to show.

To test more along these lines, perhaps include more (or less) schedule() calls in various places. Technically, there should be no issue calling schedule() from the kernel at any point, but perhaps that is the problem here: schedule will switch stacks, then switch back, corrupted. All other applications end up calling wait() which will cause a call to schedule AFTER that process tries to return from the system call. See what I mean?

Another possibility would be to comment out the schedule() call and see if that changes anything. Worst case, we might end up with a busy-loop or hang problem, but there would not be any stack corruption, and printk's should work.

Aug 02 '22 17:08 ghaerr

Thank you @ghaerr - That's a couple of very interesting angles! I didn't realize that schedule() would switch stacks in a case like this, thanks - that's very useful.

And I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?

So basically, all of these issues (except mount) occur only on the SX/ALI system, right?

If so, we need to consider BIOS or hardware issues (including possible I/O delays only seen on that computer). It is possible that the SX BIOS is using too much stack, or that the stack is being trashed, but that this is seen only under the following circumstances:

Because truncate will call reschedule, and that L1 buffers can cause a hang (remember, I can duplicate the rm hang issue on QEMU), then what possibly is happening is reschedule is trashing the stack somehow. The scenario we are discussing is one of the few cases coded in the kernel where a reschedule is coded, that may return directly to the calling application, rather than sleeping which is the result in almost all other reschedules. The reason for this on the SX system could be the speed of its I/O (rather than a BIOS stack issue), that just happens to cause a reschedule() bug to show.

To test more along these lines, perhaps include more (or less) schedule() calls in various places. Technically, there should be no issue calling schedule() from the kernel at any point, but perhaps that is the problem here: schedule will switch stacks, then switch back, corrupted. All other applications end up calling wait() which will cause a call to schedule AFTER that process tries to return from the system call. See what I mean?

Another possibility would be to comment out the schedule() call and see if that changes anything. Worst case, we might end up with a busy-loop or hang problem, but there would not be any stack corruption, and printk's should work.

Yes, this is a fresh start, I'll start off with the latter and take it from there. BTW: the speed is indeed very different between the SX and the compaq. The SX feels an order of magnitude or more faster, almost like QEMU except when booting from floppy.

Thanks.

--M

Aug 02 '22 19:08 Mellvik

panic: Process -31913 exceeded kernel stack limit! magic c483

This message could mean old-fashioned kernel stack overflow.

Lets increase the kernel stack size, just in case the SX system BIOS happens to be using a lot. Change KSTACK_BYTES in include/linuxmt/limits.h from 640 to 740, and lets see what happens.

I've been tracing through the truncate code, and its quite complicated, with all sorts of routines called, many of which grab more L1 buffers.

Aug 02 '22 19:08 ghaerr

(btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now).

Yes, inode->i_size is set to 0 by iput which indicates that the kernel wants the size truncated to 0 (this is the path taken when minix_unlink is called, after decrementing the nlink count).

However, minix_truncate can also be (indirectly) called through the iop->truncate function pointer. This occurs in a couple places, in the truncate system call, where the i_size value is set to the end result truncated size requested, and also on the open system call when O_TRUNC is specified (which happens during creat). Thus, the DIRECT_BLOCK macro is correct in its use.

Aug 02 '22 19:08 ghaerr

I didn't realize that schedule() would switch stacks in a case like this, thanks - that's very useful.

It may not switch stacks, depends on what else is running. tswtch is the function that actually switches stacks, called from within schedule().

I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?

None of us can get a "basic" (removing a single file) rm hang on QEMU, but I have an old TODO list on a more complex rm -rf hang which, because it uses up all the L1 buffers, and there are no other processes running, will produce a hang every time in QEMU.

Aug 02 '22 20:08 ghaerr

panic: Process -31913 exceeded kernel stack limit! magic c483

This message could mean old-fashioned kernel stack overflow.

Lets increase the kernel stack size, just in case the SX system BIOS happens to be using a lot. Change KSTACK_BYTES in include/linuxmt/limits.h from 640 to 740, and lets see what happens.

Good idea. I don't think we really have a stack problem since this happens only when there is an erroneous loop (presumably) eating the stack, and it's good to be able to rule it in or out.

I've been tracing through the truncate code, and its quite complicated, with all sorts of routines called, many of which grab more L1 buffers.

Indeed it is. And this may be a good time to take a close look for possible improvements - not the least reducing the pressure on L1 buffers. After all, when deleting, the only need for buffers is for the inodes - and for small files in particluar ... etc etc. Let's have a close look after we get the stack issue resolved.

Thank you.

-M

Aug 03 '22 07:08 Mellvik

(btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now).

Yes, inode->i_size is set to 0 by iput which indicates that the kernel wants the size truncated to 0 (this is the path taken when minix_unlink is called, after decrementing the nlink count).

However, minix_truncate can also be (indirectly) called through the iop->truncate function pointer. This occurs in a couple places, in the truncate system call, where the i_size value is set to the end result truncated size requested, and also on the open system call when O_TRUNC is specified (which happens during creat). Thus, the DIRECT_BLOCK macro is correct in its use.

Aaaah, of course. Thanks.

Now, here's the thing with the DIRECT_BLOCK macro: it returns an __u32 - which in the subsequent 2 macros is being cast to int. However, in V1_trunc_direct() it's being used twice as if it were an int. This seems to me like a bug that will go unnoticed as long as we're deleting files (value 0), but kick in when truncating a file.

—M

Aug 03 '22 09:08 Mellvik

I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?

None of us can get a "basic" (removing a single file) rm hang on QEMU, but I have an old TODO list on a more complex rm -rf hang which, because it uses up all the L1 buffers, and there are no other processes running, will produce a hang every time in QEMU.

OK; that's something to look into while were at it. If this hangs in QEMU it should do so on the real thing too (somewhat depending on what else is going on obviously).

Talking about QEMU - I'm seeing interesting errors that are different from the ones on other platforms, but likely related, such as on the attached screenshot. This is from a freshly mounted (and fsck'd) HD file system, actually an identical copy of the one I'm using on the SX. It may be interesting to enable DEBUG_BLK and see what's going on. This is repeatable.

—M

Aug 03 '22 10:08 Mellvik

Hello @Mellvik, can you (re)post the QEMU screenshot showing the associated error?

Now, here's the thing with the DIRECT_BLOCK macro: However, in V1_trunc_direct() it's being used twice as if it were an int

The macro, which converts a disk file size into a MINIX block number by shifting right 10 bits, will be automatically converted to an 'int' if stored, or, when compared to an int, the other operand will be automatically converted to a long. In the latter case, the gcc compiler is very good about generating code, and likely doesn't actually compare the upper 16 bits. I see your point that this macro could likely use an (int) cast, however. We can look at the code generated and see whether an (int) casting helps, after solving this issue. The reason it isn't buggy is that the max block number for any MINIX file will always fit in 16 bits (65K), which, when shift left back to a long, is the max filesystem size: 65M.

Aug 03 '22 14:08 ghaerr

aug. 2022 kl. 16:14 skrev Gregory Haerr @.***>:

Hello @Mellvik https://github.com/Mellvik, can you (re)post the QEMU screenshot showing the associated error?

I edited the post … let me know if it doesn't work.

—m

Aug 03 '22 15:08 Mellvik

Hello @Mellvik,

I have duplicated the QEMU-staged "mnb: still zero bit!8192" bug on my system. I'm not sure if this is a relic of just having run out of disk space, or whether this is related to a problem keeping up with the MINIX allocated-block bitmap. I have noted this, and will treat this bug separately from this issue. I plan on looking further into it shortly. This is very likely not related to this open issue at all, but I'll test a bit more and report back.

In the meantime, definitely lets plan on continued testing as you've described above, separate fro this new issue, thanks!

Aug 03 '22 17:08 ghaerr

Thank you @ghaerr.

Agreed - and as the mount commands show, there is ample space on the fs.

Also I agree it's likely a separate issue, it would be interesting to see if it goes away if XMS buffers are compiled in. I haven't checked that. When we find the bug this thread is focusing on, we'll know whether they are related.

Apropos - I've looked at calls to schedule() as a possible part in the scenario and realised I've been down that path before, after we talked about it a month or so ago. The thing is - and as far as I can tell from the tracings, schedule() is never called in the regular path of an unlink, neither is sleep_on. wait_on_buffer() is called a lot but the buffer is never locked in any of my cases. I still have some work to do to understand the L1 operation in this.

—M

aug. 2022 kl. 19:06 skrev Gregory Haerr @.***>:

Hello @Mellvik https://github.com/Mellvik,

I have duplicated the QEMU-staged "mnb: still zero bit!8192" bug on my system. I'm not sure if this is a relic of just having run out of disk space, or whether this is related to a problem keeping up with the MINIX allocated-block bitmap. I have noted this, and will treat this bug separately from this issue. I plan on looking further into it shortly. This is very likely not related to this open issue at all, but I'll test a bit more and report back.

In the meantime, definitely lets plan on continued testing as you've described above, separate fro this new issue, thanks!

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1204238608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOHAUHJUMYASVHNAWE3VXKRH7ANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Aug 03 '22 17:08 Mellvik

With regards to the "new" bug, lets call it "out of disk space" bug:

Also I agree it's likely a separate issue, it would be interesting to see if it goes away if XMS buffers are compiled in. I haven't checked that.

I'm testing with XMS, and the bug is present. I know what it is: MINIX keeps an 8192-byte bitmap, which equals 8192*8 = 64K bits, where each bit is a 1K block in the max 64K minix V1 filesystem size. There are actually two bugs here: the first bug is that the system isn't checking when the bitmap is >= 8192 properly, and gives the incorrect error "mnb: bit still set!8192" in that case. I am working on fixing that now.

as the mount commands show, there is ample space on the fs.

That's the second bug, I have just realized. I am testing on a 32Mb filesystem, and it is running out of space and giving the first error message. I have not been able to (yet) duplicate the second bug, which is saying there's no more disk space when there should be.

I will fix the "first" portion of the out of disk space bug in a PR, then you can try to repeat the "second" out of disk space bug.

as far as I can tell from the tracings, schedule() is never called in the regular path of an unlink,

Please comment out schedule() in the truncate code, and let me know what happens.

neither is sleep_on

Please add a printk before the "sleep_on" in map_buffer to ensure the process is never put to sleep. This will let us know for sure whether you are correct in the process never sleeping (which means never running out of L1 buffers).

I still have some work to do to understand the L1 operation in this.

L1 operation is actually quite simple - map_buffer and unmap_buffer take the buffer, and copy its data contents to an L1 region in the kernel data segment. The bh_data is then directly addressable from the other kernel routines, without using a far pointer. If there are no L1 buffers left, the process is put to sleep. Its about that simple. We just need to prove that we're not ever sleeping, to eliminate a bunch of variables for your issue. (When the buffer is later unmapped, the buffer contents are copied back to the "real" L2 buffer).

If you comment out schedule and printk sleep_on, we'll know much more and be able to more rapidly squash this bug!

Aug 03 '22 17:08 ghaerr

Finally some (real) progress on this issue:

The problem is a ‘memory leak’. 4 bytes at a fixed address in the kernel text segment are being overwritten at some point during the boot & startup process. Thus the erratic symptoms. Changing one line of code somewhere (before that address) changes the behaviour completely. (BTW - memory error has been ruled out - again, via extensive testing).
system-nm.map and the hd command were instrumental in finally locating the problem.
The location and content are always the same, c7 ef 00 f0 @ 0x16238 in the kernel text segment. The following example shows uncorrupted (first) then corrupted data.

elks17# hd d0:16238#4  <— floppy booted
006f38: c2 b8 fe ff eb cc 56 57  55 89 e5 83 ec 06 8d 46  ......VWU......F
elks17# hd d0:16238#4.  <—— HD booted
006f38: c7 ef 00 f0 eb cc 56 57  55 89 e5 83 ec 06 8d 46  ......VWU......F

This happens only on the SX machine, never on Compaq (386dx) or qemu. The Compaq has a hw config that is very similar to the SX.
Changing from EXT to XMS buffers only changes the layout of the code. The overwrite is at exactly the same place, now corrupting some code I haven’t triggered yet.

Now the good news, that may explain why the problem occurs only on this particular piece of hardware:

When booting from floppy, the problem is gone. Mounting hd file systems thereafter is no problem. Booting from floppy with root on HD (via bootopts) is no problem either.
The size of the hd minix fs doesn’t matter (I’ve tried only 2)

So the problem appears ONLY when booting the SX from HD, which should limit the scope of the continued hunt. So the boot device matters, the root device does not. Even at this level, the difference between the Compaq and the SX is limited. Both use an IDE disk channel, both use the same IDEtoCF adapter and the same CF card (512MB). Possibly something DPT/DPB related or data returned from BIOS calls?

I’m looking at that right now. In any case - as it turns out, the name of this issue is wrong - this is not a buffer problem, and most unlikely a fs problem. We're getting there.

To be continued.

-M

Aug 10 '22 12:08 Mellvik

Hello @Mellvik,

Wow, you really have made some progress on this!! I'm interested in how you were able to pinpoint the memory locations that are being overwritten?

I'd like to understand a bit more about exactly what this location is in the kernel, and whether it has any special meaning. In doing so, we need to "normalize" our segment:offset pairs for use with hd, and in testing, I've already ran into a possible problem with hd that I'm going to look into.

With regards to the normalizing, let's look at the special address "d0:16238". The offset value is 64k (=10000) greater than will fit in an 8086 offset, so this is the same as "d0:6238". What we want to do is look at the kernel startup banner and use the kernel code segment value from the following line:

Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000, top 9fc0:0, 490K free

So, using 2d0, we should be able to (re)write the d0:6238 value as 2d0:<something>. (Technically, to get the 20-bit memory address, the segment value is shifted left 4 bits and the offset added: d0:6238 = D00 + 6238 = 0:6F38). Thus, d0:6238 is the same as 2d0:4238.

If I run the following to get a sorted kernel map file:

cat system-nm.map | sed -e '/&/d; /!/d' | sort > file

Looking in the file for somewhere near '00014238' shows the affected location as in the middle of the sock_read code:

0001418d t sock_write
000141eb t sock_read
00014249 t sock_release
0001427b t sock_close

Can you check with the normalized (likely 2D0) segment and see what the normalized offset is, and its position in the kernel map file?

If you find the actual memory leak is at 2D0:6238, which corresponds to the kernel map file address 00016238 (which may be hard to do given the possible hd bug below), then, looking at the kernel map file, 00016238 is (guess what?) right in the middle of minix_unlink, which fits with why the kernel became unstable after an unlink syscall:

00015fba T minix_rmdir
0001613a T minix_unlink
0001629d T minix_symlink

On another note, when trying hd, I try to rewrite the d0:6238 address with segment 2d0: I find that hd seems to have a bug:

# hd d0:6238#4
006f38: 12 92 83 c4 08 eb de ba  f7 ff eb d9 ba ea ff eb  ................
# hd 2d0:6638#4
006f38: 04 ff 76 fc 8b 5e 08 ff  77 20 e8 a2 ed 83 c4 04  ..v..^..w ......

Note that both addresses passed to hd give the same 20-bit 006f38 memory address (which is incorrect), but it is showing the contents differently, as expected. I'll report on this bug shortly.

Thank you!

Aug 10 '22 16:08 ghaerr

Hi @ghaerr, I just noticed the omission in my report, please ignore the '2d0' vs 'd0' segment difference.

I forgot to correct that - they’re from two different runs, different builds. Some times the kernel txt segment is 2d0, sometimes it’s d0.

So I did this the hard way. The address is in the middle of kernel code, more specifically (depends on the build) in V1_trunc_direct unless we build with XMS buffers, then it’s a bit further out if I remember correctly. The hard way is as follows: V1_trunc_direct() is acting up so get the address from system-nm.map (I’ve been studying disassembly listings of truncate.o for weeks, changing one statement at a time to try to understand what’s been going on, so I’m more familiar with the low level code than I ever wanted to. BTW - the complier does an incredible job optimizing, that’s different story) Dump the memory region using hd, compare with the obj dump output (mind you, on paper - yes, I know …) Find the same area in the kernel file to make sure we’re not looking at relocation and linking differences. Identify differences. After rigorous verification we have the verdict, which corresponds to the behavior. Including the weird bitmap errors we got the other day.

What I was on my way to look at when your mail arrived, was bioshd.c and how it probes drives, in order to see if the BIOS returns odd values for the DiskOnChip which the system supports. The DOC has not been installed during testing, but it may affect how the BIOS responds to probes.

—M

aug. 2022 kl. 18:01 skrev Gregory Haerr @.***>:

Hello @Mellvik https://github.com/Mellvik,

Wow, you really have made some progress on this!! I'm interested in how you were able to pinpoint the memory locations that are being overwritten?

I'd like to understand a bit more about exactly what this location is in the kernel, and whether it has any special meaning. In doing so, we need to "normalize" our segment:offset pairs for use with hd, and in testing, I've already ran into a possible problem with hd that I'm going to look into.

With regards to the normalizing, let's look at the special address "d0:16238". The offset value is 64k (=10000) greater than will fit in an 8086 offset, so this is the same as "d0:6238". What we want to do is look at the kernel startup banner and use the kernel code segment value from the following line:

Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000, top 9fc0:0, 490K free So, using 2d0, we should be able to (re)write the d0:6238 value as 2d0:. (Technically, to get the 20-bit memory address, the segment value is shifted left 4 bits and the offset added: d0:6238 = D00 + 6238 = 0:6F38). Thus, d0:6238 is the same as 2d0:4238.

If I run the following to get a sorted kernel map file:

cat system-nm.map | sed -e '/&/d; /!/d' | sort > file Looking in the file for somewhere near '00014238' shows the affected location as in the middle of the sock_read code:

0001418d t sock_write 000141eb t sock_read 00014249 t sock_release 0001427b t sock_close Can you check with the normalized (likely 2D0) segment and see what the normalized offset is, and its position in the kernel map file?

If you find the actual memory leak is at 2D0:6238, which corresponds to the kernel map file address 00016238 (which may be hard to do given the possible hd bug below), then, looking at the kernel map file, 00016238 is (guess what?) right in the middle of minix_unlink, which fits with why the kernel became unstable after an unlink syscall:

00015fba T minix_rmdir 0001613a T minix_unlink 0001629d T minix_symlink On another note, when trying hd, I try to rewrite the d0:6238 address with segment 2d0: I find that hd seems to have a bug:

hd d0:6238#4

006f38: 12 92 83 c4 08 eb de ba f7 ff eb d9 ba ea ff eb ................

hd 2d0:6638#4

006f38: 04 ff 76 fc 8b 5e 08 ff 77 20 e8 a2 ed 83 c4 04 ..v..^..w ...... Note that both addresses passed to hd give the same 20-bit 006f38 memory address (which is incorrect), but it is showing the contents differently, as expected. I'll report on this bug shortly.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1210913480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOFU626DNTW266NVKPTVYPG4RANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Aug 10 '22 16:08 Mellvik

With regards to the origin of the memory corruption:

elks17# hd d0:16238#4.  <—— HD booted
006f38: c7 ef 00 f0 eb cc 56 57  55 89 e5 83 ec 06 8d 46  ......VWU......F

The four-byte value looks suspiciously like a possible BIOS ROM address (retrieved from memory would be): F0 00 EF C7. If its an address, this would be BIOS ROM at segment F000, address EFC7. I'm wondering whether it is a strange BIOS bug possibly to do with our bioshd driver setting a custom DDPT, or something like that.

I am just about to release my next-level debug tools, which feature a stack tracer and disassembler, both translating addresses to symbol names from the application's symbol file. I don't yet have it working inside the kernel. It currently allows for an application to disassemble portions of itself or perform a stack backtrace.

I think I'll add a seperate disasm utility that works like hd, where any segment:offset address can be given, and have it able to take a .map file, which could then allow for arbitrary disassembly of (kernel) memory, complete with kernel symbols. This should be able to help with this problem. I'll also fix the hd bug above.

Back the the corruption problem: I'm wondering if perhaps the low-core DDPT vector isn't being set properly for HD, but is for FD, for instance. This is the 1Eh vector, so its low core address would be 1E<<2 = 0:0078. Perhaps you should check to see what might be happening at that address. Also, in bioshd.c, you might turn on DEBUG_BIOS in debug.h, which will turn on some printks for DDPT copying (see copy_ddpt). Looking at this code, however, shows ELKS only sets the 1E low core vector to point at a new DDPT.

Aug 10 '22 16:08 ghaerr

was bioshd.c and how it probes drives, in order to see if the BIOS returns odd values for the DiskOnChip which the system supports.

You may be on to something here: lets take a closer look at the probing routines used by bioshd.c for the devices, in particular bioshd_gethdinfo (and possibly bioshd_getfdinfo). It's a little ugly in there, since we're temporarily sharing code for IBM and PC98, but there are notes like the following, which are a bit suspicious:

    BD_AX = BIOSHD_DRIVE_PARMS;
    BD_DX = 0x80;               /* query hard drives only*/
    BD_ES = BD_DI = BD_SI = 0;  /* guard against BIOS bugs*/ // <---!!!
    if (!call_bios(&bdt))
        ndrives = BD_DX & 0xff;
    else
        debug_bios("bioshd: get_drive_parms fail on hd\n");

I'm thinking perhaps just commenting out the call_bios entirely (possibly just fake filling in the results, but not making the BIOS call). It is definitely possible that the SX BIOS is trashing a location as a result of one of these calls.

Probably turn off CONFIG_IDE_PROBE as well.

Another possibility is to look very closely at the boot screen display differences for bioshd: output when booting from FD vs HD.

Aug 10 '22 16:08 ghaerr

Thank you @ghaerr -

With regards to the origin of the memory corruption:

elks17# hd d0:16238#4. <—— HD booted 006f38: c7 ef 00 f0 eb cc 56 57 55 89 e5 83 ec 06 8d 46 ......VWU......F The four-byte value looks suspiciously like a possible BIOS ROM address (retrieved from memory would be): F0 00 EF C7. If its an address, this would be BIOS ROM at segment F000, address EFC7. I'm wondering whether it is a strange BIOS bug possibly to do with our bioshd driver setting a custom DDPT, or something like that.

I was thinking along those lines too, but I didn’t follow the discussions closely when this stuff was updated, so I’m not familiar with it. I believe it’s important to keep in mind that the corruption doesn’t happen if we boot from floppy, then use HD as root. So whatever happens, it’s very early.

I am just about to release my next-level debug tools, which feature a stack tracer and disassembler, both translating addresses to symbol names from the application's symbol file. I don't yet have it working inside the kernel. It currently allows for an application to disassemble portions of itself or perform a stack backtrace.

I think I'll add a seperate disasm utility that works like hd, where any segment:offset address can be given, and have it able to take a .map file, which could then allow for arbitrary disassembly of (kernel) memory, complete with kernel symbols. This should be able to help with this problem. I'll also fix the hd bug above.

Great idea - that would be very welcome. Hd-bug, thanks - it’s on my list too, I’m deleting it now that you’re on it.

Back the the corruption problem: I'm wondering if perhaps the low-core DDPT vector isn't being set properly for HD, but is for FD, for instance. This is the 1Eh vector, so its low core address would be 1E<<2 = 0:0078. Perhaps you should check to see what might be happening at that address.

I don’t know what to look for yet, but I’ll compare the values on floppy booted and HD booted systems. Here’s the HD booted: elks17# hd 2d0:16238#4 006f38: c7 ef 00 f0 fa 52 ff 74 20 e8 a9 ed 83 c4 04 eb .....R.t ....... elks17# elks17# hd 0:78#10 000078: b4 b9 9c 13 e0 5a 00 c0 a0 87 00 f0 a0 87 00 f0 .....Z.......... elks17#

Also, in bioshd.c, you might turn on DEBUG_BIOS in debug.h, which will turn on some printks for DDPT copying (see copy_ddpt). Looking at this code, however, shows ELKS only sets the 1E low core vector to point at a new DDPT.

OK, I’ll be back with more …

—M

Aug 10 '22 16:08 Mellvik

Hello @ghaerr, hello @Mellvik,

You may be on to something here: lets take a closer look at the probing routines used by bioshd.c for the devices, in particular bioshd_gethdinfo (and possibly bioshd_getfdinfo). It's a little ugly in there, since we're temporarily sharing code for IBM and PC98, but there are notes like the following, which are a bit suspicious:
    BD_AX = BIOSHD_DRIVE_PARMS;
    BD_DX = 0x80;               /* query hard drives only*/
    BD_ES = BD_DI = BD_SI = 0;  /* guard against BIOS bugs*/ // <---!!!
    if (!call_bios(&bdt))
        ndrives = BD_DX & 0xff;
    else
        debug_bios("bioshd: get_drive_parms fail on hd\n");

The "fix" follows a suggestion from Ralf Brown's Interrupt List:

--------B-1308-------------------------------
INT 13 - DISK - GET DRIVE PARAMETERS (PC,XT286,CONV,PS,ESDI,SCSI)
        AH = 08h
        DL = drive (bit 7 set for hard disk)
        ES:DI = 0000h:0000h to guard against BIOS bugs
Return: CF set on error
            AH = status (07h) (see #00234)
        CF clear if successful
            AH = 00h
            AL = 00h on at least some BIOSes
            BL = drive type (AT/PS2 floppies only) (see #00242)
            CH = low eight bits of maximum cylinder number
            CL = maximum sector number (bits 5-0)
                 high two bits of maximum cylinder number (bits 7-6)
            DH = maximum head number
            DL = number of drives
            ES:DI -> drive parameter table (floppies only)

...

        some Toshiba BIOSes (at least before 1995, maybe some laptops???
          with 1.44 MB floppies) have a bug where they do not set the ES:DI
          vector even for floppy drives. Hence these registers should be
          preset with zero before the call and checked to be non-zero on
          return before using them.  Also it seems these BIOSes can return
          wrong info in BL and CX, as S/DOS 1.0 can be configured to preset
          these registers as for an 1.44 MB floppy.

Also, the DDPT — as returned via es:di, and which should be pointed to by the int 0x1e vector — is only relevant for floppy drives, not for hard drives. My understanding is that parameters that govern hard drive I/O are specified by a different set of vectors, namely int 0x41, int 0x46, etc.

Thank you!

Aug 10 '22 17:08 tkchia

Hello @Mellvik,

I don’t know what to look for yet, but I’ll compare the values on floppy booted and HD booted systems. Here’s the HD booted: elks17# hd 2d0:16238#4 006f38: c7 ef 00 f0 fa 52 ff 74 20 e8 a9 ed 83 c4 04 eb .....R.t ....... elks17# elks17# hd 0:78#10 000078: b4 b9 9c 13 e0 5a 00 c0 a0 87 00 f0 a0 87 00 f0 .....Z.......... elks17#

As a sanity check, perhaps try dumping the bytes that these pointers are pointing to? c7 ef 00 f0 would be 0xf000:0xefc7 (as @ghaerr pointed out). And b4 b9 9c 13 would be 0x139c:0xb9b4. These should point to 11-byte (or 14-byte) DDPTs and should be mostly identical. So... maybe try

# hd f000:efc7#e
# hd 139c:b9b4#e  # or whatever address you read off from 0:78

(And some other possible sanity checks:

Is e.g. 0x139c:0xb9b4 the runtime address of the DDPT[] variable in the ELKS kernel?
Does the vec1E variable always read 78 00 00 00?)

Thank you!

Aug 10 '22 17:08 tkchia

Hello @Mellvik, Hello @tkchia,

I agree with all of @tkchia's suggestions, they are very good ideas. I think we can get to the bottom of this, but I want to point out that with all the segment:offset addresses being presented, it is important that two things be done after each new kernel build, so that we can more easily understand the results of each test (and yes, I am working to automate this):

Run cat system-nm.map | sed -e '/&/d; /!/d' | sort > linux.map to generate a readable kernel symbol table.
Record the text, fartext, and data segment values from the kernel boot line:

Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000

The above would show 2d0: mapping to symbols in the linux.map 0x0001XXXX section (.text) of the kernel.map file, 1210: to 0x0002XXXX section (.fartext), and 1512: mapping to 0x0003XXXX (.data/.bss) section.

Then, only use one of the three segment values in hd for display, and rewrite any longword values into segment:offset. For instance, with @tkchia's suggestion above to look at 0x139c:0xb9b4, this makes sense when the kernel .data segment is 0x139c, and would then allow to search the kernel map file for 0x0003b9b4. (In my mapfile, this is not the DDPT[] variable, but much higher, in the .bss area).

I plan on adding a couple more capabilities and files to the distribution: the ability to generate a human readable sorted .map file for the kernel or any application, as well as a .sym file (binary format) of the same, for use in the hd and disasm tool enhancements. The kernel files could be called linux.map and linux.sym and could be placed in the ELKS /lib directory for use by the tools or humans for real-time debugging. (There is still the problem of determining the runtime kernel .text, .fartext and .data segment values to map to the appropriate sections, any thoughts on how to do that?)

Thank you!

Aug 10 '22 18:08 ghaerr

Hello @Mellvik,

Hd-bug, thanks - it’s on my list too, I’m deleting it now that you’re on it.

I've found it, but am in the middle of the disasm enhancement, and I can't post it immediately. The 20-bit address to the left is sometimes displayed incorrectly, but the contents are correct. Apply this patch to fix it:

diff --git a/elkscmd/misc_utils/hd.c b/elkscmd/misc_utils/hd.c
index 53cfdb3d..f6653cc6 100644
--- a/elkscmd/misc_utils/hd.c
+++ b/elkscmd/misc_utils/hd.c
@@ -112,7 +112,7 @@ void do_mem(char *spec)
    //printf("SEG %x OFF %x CNT %ld\n", seg, off, count);
 
    addr = (unsigned char __far *)(((unsigned long)seg << 16) | off);
-   offset = (((unsigned long)seg << 4) | off);
+   offset = (((unsigned long)seg << 4) + off);
    for ( ; count > 0; count -= 16, offset += 16) {
       int j, ch;
       memset(buf, '\0', 16);

elks17# hd 2d0:16238#4

Note that "16238" is not a valid offset, as its > 64K. For now, hd just wraps and ignores it, but a future version may not. You should just be using hd 2d0:6238#4. I will probably fix hd to display an error message for this case to avoid confusion.

Thank you!

Aug 10 '22 19:08 ghaerr

Thank you @tkchia and @ghaerr - that's a lot of useful info and suggestions. I'll attack them one by one.

I did take out IDE_PROBE, but as expected it's called far to late in the boot process to affect the situation.
Slightly off topic but affecting my speed of progress: @ghaerr - I'm getting these (picture) every once in a while, and the only remedy is to clean out the file system (mkfs). Is there a better way? (the output after the second 'Linux Found' message may vary ...).
I did include the BIOS debug in a build, and got the following (booting from HD, error verified to be present) (DiskOnChip now installed which doesn't change anything other than adding a HD and reducing available memory):

Direct console, scan kbd 80x25 emulating ANSI (3 virtual consoles)
ttyS0 at 0x3f8, irq 4 is a 16550A
ttyS1 at 0x2f8, irq 3 is a 16550A
ttyS2 at 0x3e8, irq 5 is a 16550A
ttyS3 at 0x2e8, irq 2 is a 16550A
64 ext buffers, 65536 ram
eth: ne2k at 0x380, irq 9 (8bit), MAC 00:1f:11:02:60:2d, flags 0x80
eth: wd80x3 at 0x240, irq 2, ram 0xce00 not found
eth: 3c509 at 0x330, irq 11 not found
bioshd: hda BIOS CHS 992,16,63
bioshd: hdb BIOS CHS 728,2,1
/dev/hda: 992 cylinders, 16 heads, 63 sectors = 488.3 Mb
/dev/hdb: 728 cylinders, 2 heads, 1 sectors = 728.0 kb
/dev/fd0: 80 cylinders, 2 heads, 18 sectors = 1440.0 kb
bioshd: DDPT vector 8000:0 SPT 216
Partitions: hda:(0,999936) bioshd: lba 0 is CHS 0/0/1 remaining sectors 63
bioshd(128): track read CHS 0/0/1 count 18
bioshd(128): track read lba 0 to 17 count 18
bioshd(128): cache hit lba 0
bioshd(128): cache hit lba 1
 hda1:(63,696465)  hda2:(696528,132048)  hda3:(828576,132048)  hda4:(960624,41328) 
 hdb:(0,1456) bioshd: lba 0 is CHS 0/0/1 remaining sectors 1
bioshd(129): track read CHS 0/0/1 count 1
bioshd(129): track read lba 0 to 0 count 1
bioshd(129): cache hit lba 0
bioshd: lba 1 is CHS 0/1/1 remaining sectors 1
bioshd(129): track read CHS 0/1/1 count 1
bioshd(129): track read lba 1 to 1 count 1
bioshd(129): cache hit lba 1
 hdb1:(1,1455) 
device_setup: BIOS drive 0x80, root device 0x304
PC/AT class machine, syscaps 0xff, 608K base ram.
ELKS kernel 0.6.0 (57216 text, 11536 ftext, 8384 data, 42400 bss, 14750 heap)
Kernel text at 2d0:0000, ftext 10c8:0000, data 1399:0000, top 9800:0, 465K free
bioshd: lba 960626 is CHS 953/0/3 remaining sectors 61
bioshd(128): track read CHS 953/0/3 count 18
bioshd(128): track read lba 960626 to 960643 count 18
...

I noticed the DDPT vector being 8000:0, which according to hd looks like this: 080000: 8b 46 fc 83 e0 40 74 10 8b 5e 08 80 7f 01 40 75 .F...@t..^....@u

The bad data, if it's a pointer, points as suggested into what looks like BIOS code

# hd f000:efc7#e
0fefc7: df 02 25 02 24 1b ff 54  f6 0f 08 e9 b4 08 33 db  ..%.$..T......3.
0fefd7: b4 00 b9 b7 00 8a c1 e6  70 e3 00 e3 00 e4 71 e3  ........p.....q.
0fefe7: 00 03 d8 41 81 f9 be 00  75 02 41 41 2e 3b 0e ee  ...A....u.AA.;..

.... while the DDPT pointer and content looks like this now:

# hd 0:78#4
000078: e4 ba 99 13 e0 5a 00 c0  a0 87 00 f0 a0 87 00 f0  .....Z..........
# hd 1399:bae4
01bbf4: b8 20 01 8e 3f 81 3e ce  05 55 aa 75 08 81 02 00  . ..?.>..U.u....
01bc04: 01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
01bc14: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
*
#

And DDPT in the symbol table is 003bae4 so this looks good.

vec1E looks right:

# hd 1399:1fd0#4
013fd0: 78 00 00 00 80 81 82 83  00 01 02 03 28 00 09 00  x...........(...

The boot screens are the same when booting from HD and FD.

I'm continuing with the bios calls in gethdinfo(). The BIOS for the DiskOnChip is at d000:0 (jumper selectable), the docs do not mention a way to disable the DOC altogether, which would have been advantageous in this process.

--M

Aug 11 '22 12:08 Mellvik

Hello @Mellvik,

From the most recent tests, everything looks pretty normal. I'm thinking perhaps now that we know where the RAM corruption is, we might concentrate on exactly when it occurs. That is, seeing some of your boot error message, perhaps the corruption is occurring very early in the boot process, possibly just after the kernel is loaded into RAM and relocated.

We can call printk very early in the boot sequence now, perhaps displaying the contents of the corrupted RAM location at several places during startup, using something like the following would show corruption:

printk("RAM %lx\n", *(unsigned long __far *)0x00D06238L);

(Change the long constant to exactly the memory location error; in this case the long hi word is segment and lo word is offset).

This (or a procedure call wrapper around it) could then be placed in elks/init/main.c::kernel_init() in various places.

I'm getting these (picture) every once in a while, and the only remedy is to clean out the file system (mkfs). Is there a better way? (the output after the second 'Linux Found' message may vary ...).

I'm wondering whether these boot retry and multiple "Linux found" error messages are symptoms of the same problem (very early corruption) - there should never be multiple "Linux found" messages. This could indicate either a corrupted HD boot sector, something I have been considering to be the problem, and/or RAM corruption during the boot sector operation (it is loaded at 0:7C00, and then relocates itself several times, and the kernel could be getting corrupted before it even starts running).

IIRC, the boot sector also rewrites the DDPT vector. I will look further into that, although I'm not sure we have much evidence that DDPT is the problem.

Thank you!

Aug 11 '22 13:08 ghaerr

Hello @Mellvik, Hello @tkchia,

I just noticed the old DDPT vector is at 0x8000:0. That's right in the middle of memory, possibly where the boot sector code is operating and relocating the kernel.

I seem to remember that the boot code sets the DDPT vector while reading the additional boot sectors, then resets it just before exit. Should this reset to 0x8000:0, that might be in the middle of the kernel in RAM (depending on how much RAM is on this system), during the first kernel relocation, or afterwards, before it is jumped to. I'm wondering whether the boot code needs to reset the DDPT, and/or whether this DDPT location may be part of the problem seen in both the multiple "Linux Found" error messages, as well as the RAM corruption.

What I'm thinking is perhaps the BIOS is also trying to "reset" the DDPT vector somehow by writing its DDPT location into the low-core INT 1Eh vector ELKS is/was using, which still points into the RAM area ELKS is using, corrupting it. It is possible this DDPT "reset" rewrite occurs during the bioshd.c BIOSHD_RESET call, or much earlier in the boot process.

Thank you!

Aug 11 '22 14:08 ghaerr

@Mellvik,

To build a boot sector that does not reset the DDPT, the following line should be commented out of bootblocks/boot_sect.S:

boot_it:

    // Signify that /linux was loaded as 1 blob
    orb $(EF_AS_BLOB|EF_BIOS_DEV_NUM),elks_flags
    mov %dx,root_dev
    mov %cx,part_offset  // save sector offset of booted partition
    mov %bx,part_offset+2

   // RESTORE_DDPT // <-- comment this line out

    mov $LOADSEG,%ax
    mov %ax,%ds
    mov %ax,%es
    ljmp $LOADSEG+0x20,$0

However, if this is found to be the issue, NOT restoring the DDPT will still likely cause the BIOS to "reset" the DDPT using the boot sector's INT 1Eh address, which is still pointing into ELKS-used RAM, and just cause corruption at a different address. Thus, the RESTORE_DDPT perhaps needs to write an INT 1eh vector pointing to some unused RAM area for this BIOS?

We're all still guessing here - the BIOS should be "resetting" the DDPT by actually rewriting the 1Eh vector. I am only guessing that what is happening is that it is using the existing DDPT vector, but adding some bytes (which look like an address?) at the end of it? This may be a case of buggy BIOS, and we're going to have to work around it, since this only happens on one system.

Aug 11 '22 14:08 ghaerr

@Mellvik: I'm still trying to figure out why the INT 1Eh vector is supposedly set by default to 8000:0. (=512k). How much memory is on the SX? Also, it would be useful to display the low core area using hd 0:0#512.

Aug 11 '22 15:08 ghaerr

Hello @ghaerr,

Hmm... is the bug what I suspect it is? If my surmise is correct:

The BIOS did set int 0x1e to point to the "standard", IBM-compatible DDPT address, which is 0xf000:0xefc7.
ELKS's bootloader (elks/bootblocks/boot_sect.S) temporarily moved the DDPT to 0x8000:0.
After loading the kernel and setup.S code, the bootloader later tried to restore the 0xf000:0xefc7 vector by writing to 0:0x78, but failed. Instead 0xf000:0xefc7 was written to some other address, specifically 0x7c0:0x78.

More precisely: currently the elks/bootblocks/boot_sect.S code assumes it was loaded at 0:0x7c00 and the BIOS or previous bootloader transferred control to it at 0:0x7c00. It is possible that @Mellvik's hard disk MBR decided to be a bit ornery and transferred control to 0x7c0:0, rather than 0:0x7c00. If so, this would probably be quite a rare instance that we encounter such an intermediate bootloader.

@Mellvik, perhaps you can try this patch and see if it improves things?

diff --git a/bootblocks/boot_sect.S b/bootblocks/boot_sect.S
index 19829644..a9914fe7 100644
--- a/bootblocks/boot_sect.S
+++ b/bootblocks/boot_sect.S
@@ -65,7 +65,8 @@
 
        .text
 
-// Loaded by the BIOS at 0:7C00h (32K - 1K) with DL = boot drive number
+// Loaded by the BIOS at 0:7C00h (32K - 1K) or 7C0h:0
+// with DL = boot drive number
 
 entry:
 
@@ -129,7 +130,7 @@ entry1:
        lds (%bx),%si  // ds:si = BIOS original table
        push %ds
        push %si
-       push %cs
+       push %cx       // push address 0:78h (cx = 0)
        push %bx
 .if floppy_table == entry
        xor %di,%di

Thank you!

Aug 11 '22 15:08 tkchia

Thank you @tkchia,

I did rush to get this one teated, AND IT SEEMS TO WORK. Admittedly I need to sit down tomorrow and understand exactly ’the magic’, but it looks very promising.

I ran into a couple of snags - unrelated to the memory overwrite problem. First, my modified version of makeboot (to just copy the bootblack) had a couple of issues. Then it turns out that the makefile in bootblocks doesn’t pad minix.bin to 1024, which makeboot demands, so some bandaid was required.

Anyway - thanks @tkchia and @ghaerr, we may actually have nailed it - 30 days in.

—M

aug. 2022 kl. 17:33 skrev tkchia @.***>:

Hello @ghaerr https://github.com/ghaerr,

Hmm... is the bug what I suspect it is? If my surmise is correct:

The BIOS did set int 0x1e to point to the "standard", IBM-compatible DDPT address, which is 0xf000:0xefc7. ELKS's bootloader (elks/bootblocks/boot_sect.S) temporarily moved the DDPT to 0x8000:0. After loading the kernel and setup.S code, the bootloader later tried to restore the 0xf000:0xefc7 vector at 0:0x78, but failed, and instead 0xf000:0xefc7 was written to some other address, specifically 0x7c0:0x78. More precisely: currently the elks/bootblocks/boot_sect.S code assumes it was loaded at 0:0x7c00 and the BIOS or previous bootloader transferred control to it at 0:0x7c00. It is possible that @Mellvik https://github.com/Mellvik's hard disk MBR decided to be a bit ornery and transferred control to 0x7c0:0, rather than 0:0x7c00. If so, this would probably be quite a rare instance that we encounter such a bootloader.

@Mellvik https://github.com/Mellvik, perhaps you can try this patch and see if it improves things?

diff --git a/bootblocks/boot_sect.S b/bootblocks/boot_sect.S index 19829644..94291c38 100644 --- a/bootblocks/boot_sect.S +++ b/bootblocks/boot_sect.S @@ -129,7 +129,7 @@ entry1: lds (%bx),%si // ds:si = BIOS original table push %ds push %si
  push %cs
  push %cx       // push address 0:78h (cx = 0)
  push %bx
.if floppy_table == entry xor %di,%di Thank you!

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1212152222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOGHEAURCSIX6FWMZJLVYUMMNANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Aug 11 '22 17:08 Mellvik

Hello @tkchia,

Hmm... is the bug what I suspect it is? If my surmise is correct

Thank you for your help and fix. I was beginning to think it might be our own code in RESTORE_DDPT writing the longword after all, and you've nailed it. Nice catch :)

If so, this would probably be quite a rare instance that we encounter such an intermediate bootloader.

I'm almost certain this is the culprit, but we'll wait until @Mellvik tests a bit more before celebrating. What I don't yet understand is how this only fails on HD boots and never FD, but I suppose that all depends on exactly what code got corrupted.

Thank you!

Aug 11 '22 18:08 ghaerr

Hi @tkchia and @ghaerr -

while there is still some tests I'd like to do - time permitting, I'm quite confident that the bug has been found and fixed. I've run some old kernels (marked 'bad') and they work (rm works) fine and the memory location has no traces of the previous C7 sequence. I also installed the modified bootblock on the compaq HD and it boots fine.

It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.

Happy to be out of the rabbit hole (again), I have some 'house cleaning' to do before submitting a PR with some updates, but this seems like a good time to close this issue.

Thanks again!

-M

Aug 12 '22 11:08 Mellvik

Hello @ghaerr, hello @Mellvik,

Thanks for the heads-up!

What I don't yet understand is how this only fails on HD boots and never FD, but I suppose that all depends on exactly what code got corrupted.

My current wild guess, is that the ROM BIOS behaves like a bog-standard IBM-compatible BIOS, and transfers control to the bootloader through 0:0x7c00, but the hard disk's master boot record has other ideas. Like so:

         ROM BIOS
0:0x7c00 │      │ 0:0x7c00
         ▾      ▾
      ELKS      HD MBR
bootloader      │ 0x7c0:0 (!?)
     on FD      ▾
                ELKS
 (FD boot)      bootloader
                on HD

                (hd boot)

Such a situation will also explain why boots from the floppy drive are not affected by the glitch.

If this DDPT weirdness happened on my machine, I would be quite tempted to disassemble the MBR code, to confirm if it is indeed doing what I think it is doing. :slightly_smiling_face:

Thank you!

Aug 12 '22 14:08 tkchia

Hello @Mellvik, Hello @tkchia,

I would be quite tempted to disassemble the MBR code, to confirm if it is indeed doing what I think it is doing.

Great idea @tkchia. @Mellvik, if you could dd the MBR off the SX HD and post it, I will add an ability for our new disasm to take a file argument for disassembly, test that it displays properly, and we can then take a look at the MBR jump code.

Thank you!

Aug 12 '22 16:08 ghaerr

It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.

@Mellvik, that was some fantastic debugging work, coming up with the exact location of the data corruption. And during the bug hunt, you managed to discover what turned out to be a major MINIX filesystem problem, as well. Nice work!!

Aug 12 '22 17:08 ghaerr

It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.

@Mellvik, that was some fantastic debugging work, coming up with the exact location of the data corruption. And during the bug hunt, you managed to discover what turned out to be a major MINIX filesystem problem, as well. Nice work!!

Thank you, @ghaerr. it ended up being a 'community effort' - and an educating experience. It never occurred to me that the MBR may have been part of it. Not the problem but the path to the problem. If I'd MBRd the drive we'd lost the chance to notice, find and fix the problem, :-)

If this DDPT weirdness happened on my machine, I would be quite tempted to disassemble the > MBR code, to confirm if it is indeed doing what I think it is doing. 🙂

Thanks @tkchia, I'll do just that and let you know (using the disasm utility). Turns out the MBR is the one originally delivered on the FAT formatted CF card (512mb).

-M

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Aug 12 '22 21:08 Mellvik

OK, here's the file - and a partial disasm skipping the strings and the partition table. SX-MBR.bin.gz

Looking at this it's easy to want more, such as the ability to force the disassembler to 'sync' at jump/call destinations or specific addresses. Fortunately in this case it seems to sync at 0x1f anyway, after the cli byte.

# ./disasm SX-MBR.bin 
CS = 530f
0000  eb 1c             jmp	.+30 // 001e
0002  60                ???
0003  00 8e c0 8e       add	%cl,0x8ec0(%bp)
0007  d0 31             SHL	$1,(%bx,%di)
0009  e4 31             in 	$0x31,%al
000b  80 57 8e df       adcb	$0xdf,-0x72(%bx)
000f  be 00 7c          mov	$0x7c00,%si
0012  b9 00 01          mov	$0x100,%cx
0015  fc                cld
0016  f3                repz 
0017  a5                movsw	
0018  06                push	%es
0019  1f                pop	%ds
001a  07                pop	%es
001b  16                push	%ss
001c  b9 21 fa          mov	$0xfa21,%cx
001f  33 c0             xor	%ax,%ax
0021  8e d8             mov	%ax,%ds
0023  8e d0             mov	%ax,%ss
0025  bc 00 7c          mov	$0x7c00,%sp
0028  8b f4             mov	%sp,%si
002a  fb                sti
002b  c4 06 4c 00       les	(0x004c),%ax
002f  a3 07 7c          movw	%ax,(0x7c07)
0032  8c 06 09 7c       mov	%al,(0x7c09)
0036  a1 13 04          movw	(0x0413),%ax
0039  48                dec	%ax
003a  48                dec	%ax
003b  b1 06             mov	$0x6,%cl
003d  a3 13 04          movw	%ax,(0x0413)
0040  d3 e0             shl	%cl,%ax
0042  8e c0             mov	%ax,%es
0044  b9 00 02          mov	$0x200,%cx
0047  33 ff             xor	%di,%di
0049  fc                cld
004a  f3                repz 
004b  a4                movsb	
004c  b8 88 00          mov	$0x88,%ax
004f  06                push	%es
0050  50                push	%ax
0051  cb                retf
0052  1e                push	%ds
0053  50                push	%ax
0054  f6 c2 f0          test	$0xf0,%dl
0057  75 28             jnz	.+42 // 0081
0059  d0 ec             shr	$1,%ah
005b  fe cc             decb	%ah
005d  75 22             jnz	.+36 // 0081
005f  33 c0             xor	%ax,%ax
0061  8e d8             mov	%ax,%ds
0063  a1 6c 04          movw	(0x046c),%ax
0066  8a c4             mov	%ah,%al
0068  2e 2a 06 03 00    subb	%cs:(0x0003),%al
006d  3c 02             cmpb	$0x2,%al
006f  72 10             jb 	.+18 // 0081
0071  2e 88 26 03 00    movb	%cs:%ah,(0x0003)
0076  3d 02 00          cmpw	$0x2,%ax
0079  73 03             jnb	.+5 // 007e
007b  e8 cc 00          call	014a (014a)
007e  e8 e8 00          call	0169 (0169)
0081  58                pop	%ax
0082  1f                pop	%ds
0083  2e ff 2e 07 00 33 c0 jmp	%cs:c033:0007
008a  8e c0             mov	%ax,%es
008c  cd 13             int	$0x13
008e  0e                push	%cs
008f  1f                pop	%ds
0090  80 3e 0b 00 00    cmpb	$0x0,(0x000b)
0095  74 24             jz 	.+38 // 00bb
0097  be ae 01          mov	$0x1ae,%si
009a  83 c6 10          add	$0x10,%si
009d  80 3c 80          cmpb	$0x80,(%si)
00a0  75 f8             jnz	.-6 // 009a
00a2  8b 14             mov	(%si),%dx
00a4  8b 4c 02          mov	0x02(%si),%cx
00a7  bb 00 7c          mov	$0x7c00,%bx
00aa  b8 01 02          mov	$0x201,%ax
00ad  cd 13             int	$0x13
00af  72 d7             jb 	.-39 // 0088
00b1  2e c7 06 48 01 c0 07 movw	%cs:$0x7c0,(0x0148)
00b8  eb 7b             jmp	.+125 // 0135
00ba  90                xchg	%ax,%ax
00bb  b2 80             mov	$0x80,%dl
00bd  e8 a9 00          call	0169 (0169)
00c0  bf 18 00          mov	$0x18,%di
00c3  8b 45 f9          mov	-0x07(%di),%ax
00c6  b9 04 00          mov	$0x4,%cx
00c9  d3 e8             shr	%cl,%ax
00cb  8b e8             mov	%ax,%bp
00cd  8b 45 fe          mov	-0x02(%di),%ax
00d0  d1 e0             shl	$1,%ax
00d2  40                inc	%ax
00d3  03 e8             add	%ax,%bp
00d5  f6 35             idiv	(%di)
00d7  8a cc             mov	%ah,%cl
00d9  fe c1             incb	%cl
00db  33 d2             xor	%dx,%dx
00dd  8a f0             mov	%al,%dh
00df  bb 00 05          mov	$0x500,%bx
00e2  b8 01 02          mov	$0x201,%ax
00e5  cd 13             int	$0x13
00e7  72 f9             jb 	.-5 // 00e2
00e9  8b c5             mov	%bp,%ax
00eb  be 34 00          mov	$0x34,%si
00ee  b7 07             mov	$0x7,%bh
00f0  f6 35             idiv	(%di)
00f2  33 c9             xor	%cx,%cx
00f4  86 e1             xchg	%ah,%cl
00f6  2b f1             sub	%cx,%si
00f8  f6 75 02          idiv	0x02(%di)
00fb  8b d0             mov	%ax,%dx
00fd  86 d5             xchg	%dl,%ch
00ff  8a 05             mov	(%di),%al
0101  2a c1             sub	%cl,%al
0103  fe c1             incb	%cl
0105  b4 02             mov	$0x2,%ah
0107  50                push	%ax
0108  cd 13             int	$0x13
010a  58                pop	%ax
010b  72 f8             jb 	.-6 // 0105
010d  b1 01             mov	$0x1,%cl
010f  02 f8             add	%al,%bh
0111  02 f8             add	%al,%bh
0113  8b 05             mov	(%di),%ax
0115  fe c6             incb	%dh
0117  3a 75 02          cmp	0x02(%di),%dh
011a  72 04             jb 	.+6 // 0120
011c  b6 00             mov	$0x0,%dh
011e  fe c5             incb	%ch
0120  2b f0             sub	%ax,%si
0122  73 e1             jnb	.-29 // 0105
0124  03 c6             add	%si,%ax
0126  b4 02             mov	$0x2,%ah
0128  cd 13             int	$0x13
012a  8a 6d fd          mov	-0x03(%di),%ch
012d  8b dd             mov	%bp,%bx
012f  c7 06 48 01 70 00 movw	$0x70,(0x0148)
0135  33 c0             xor	%ax,%ax
0137  a2 0b 00          movb	%al,(0x000b)
013a  8e d8             mov	%ax,%ds
013c  b0 52             mov	$0x52,%al
013e  a3 4c 00          movw	%ax,(0x004c)
0141  8c 0e 4e 00       mov	%cl,(0x004e)
0145  ea 00 00 70 00    jmp	0070:0000
014a  bb ff ff          mov	$0xffff,%bx
014d  b0 b6             mov	$0xb6,%al
014f  e6 43             out	%al,$0x43
0151  89 d8             mov	%bx,%ax
0153  e6 42             out	%al,$0x42
0155  88 e0             mov	%ah,%al
0157  e6 42             out	%al,$0x42
0159  e4 61             in 	$0x61,%al
015b  0c 03             orb	$0x3,%al
015d  e6 61             out	%al,$0x61
015f  4b                dec	%bx
0160  b9 64 00          mov	$0x64,%cx
0163  e2 fe             loopz	.0 // 0163
0165  eb ea             jmp	.-20 // 0151
0167  5b                pop	%bx
0168  c3                ret
0169  53                push	%bx
016a  51                push	%cx
016b  52                push	%dx
016c  06                push	%es
016d  56                push	%si
016e  57                push	%di
016f  0e                push	%cs
0170  1f                pop	%ds
0171  0e                push	%cs
0172  07                pop	%es
0173  bb 00 02          mov	$0x200,%bx
0176  b9 01 00          mov	$0x1,%cx
0179  32 f6             xor	%dh,%dh
017b  b8 01 02          mov	$0x201,%ax
017e  9c                pushf
017f  ff 1e 07 00 72 22 call	2272:0007
0185  c7 07 eb 1c       movw	$0x1ceb,(%bx)
0189  80 fa 80          cmp	$0x80,%dl
018c  75 04             jnz	.+6 // 0192
018e  88 16 0b 02       movb	%dl,(0x020b)
0192  fc                cld
0193  b9 a0 01          mov	$0x1a0,%cx
0196  be 1e 00          mov	$0x1e,%si
0199  bf 1e 02          mov	$0x21e,%di
019c  f3                repz 
019d  a4                movsb	
019e  b8 01 03          mov	$0x301,%ax
01a1  41                inc	%cx
01a2  9c                pushf
01a3  ff 1e 07 00 5f 5e call	5e5f:0007
01a9  07                pop	%es
01aa  5a                pop	%dx
01ab  59                pop	%cx
01ac  5b                pop	%bx
01ad  c3                ret

Aug 13 '22 16:08 Mellvik

Hello @Mellvik,

Thanks! This MBR code is highly interesting, but also a bit worrying. I hope this particular MBR is not malware (and I am not kidding). More on that later.

Thank you!

Aug 13 '22 16:08 tkchia

Thank you @tkchia - actually I had the same thought when a saw the string towards the end.

Looking forward to your verdict :-)

-M

BTW, continuing on the same tangent, I pulled the MBR off of two other vintage CF cards. Interestingly they are all different. Then again, old stuff like this via eBay will in many cases have been used for something or other before.

aug. 2022 kl. 18:33 skrev tkchia @.***>:

Hello @Mellvik https://github.com/Mellvik,

Thanks! This MBR code is highly interesting, but also a bit worrying. I hope this particular MBR is not malware (and I am not kidding). More on that later.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1214186167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGODW7DHP64FSOLZ4U5DVY7E6ZANCNFSM53KDGGZQ. You are receiving this because you were mentioned.

Aug 13 '22 16:08 Mellvik

Hello @Mellvik, hello @ghaerr,

Some quick observations:

Yes, it turns out that the MBR does transfer control to the boot partition's bootloader through 0x7c0:0. For some reason, the MBR uses self-modifying code to do that:

00a7  bb 00 7c          mov	$0x7c00,%bx
00aa  b8 01 02          mov	$0x201,%ax
00ad  cd 13             int	$0x13
00af  72 d7             jb 	.-39 // 0088
00b1  2e c7 06 48 01 c0 07 movw	$0x7c0,%cs:(0x0148) // [*] %cs: override applies to dest. operand
00b8  eb 7b             jmp	.+125 // 0135
...
0135  33 c0             xor	%ax,%ax
0137  a2 0b 00          movb	%al,(0x000b)
013a  8e d8             mov	%ax,%ds
013c  b0 52             mov	$0x52,%al
013e  a3 4c 00          movw	%ax,(0x004c)
0141  8c 0e 4e 00       movw	%cs,(0x004e) // [*] %cl should be %cs
0145  ea 00 00 70 00    ljmpw	$0x0070,$0x0000 // [*] this is the correct GNU as syntax

The instruction at %cs:0x00b1 alters the instruction %cs:0x0145 so that, by the time we reach it, it effectively becomes, well,

0145  ea 00 00 c0 07    ljmpw	$0x07c0,$0x0000

Also, you may notice that the instructions at %cs:0x13e—%cs:0x0144 redirect the int 0x13 vector (at 0:0x4c) to point at the MBR's own code (!). This is after the MBR has copied itself to upper conventional memory and then reserved the memory for itself (at %cs:0x3d).
@ghaerr: there are some glitches in the disassembly. I have indicated some of these above ([*]). Also, the indirect far jump and call opcodes (0xff 0x2e... / 0xff 0x1e ...) are misinterpreted, e.g.
```
0083  2e ff 2e 07 00 33 c0 jmp	%cs:c033:0007
```
should be
```
0083  2e ff 2e 07 00    ljmpw	*%cs:0x0007
```
and it really means to transfer control to the segment:offset value stored at %cs:0x0007. Likewise for %cs:0x17f and %cs:0x1a3.
And yep, this MBR is almost certainly malware. It is rather interesting piece of malware too (in my opinion). But perhaps there is some classical anti-virus program that actually knows about it. (Edit: this might be the same as the "AntiCMOS.B" virus that F-Secure is referring to: http://www.f-secure.com/v-descs/anticmos.shtml .)

Thank you!

Aug 13 '22 17:08 tkchia

actually I had the same thought when a saw the string towards the end.

@Mellvik, can you post the binary boot block? I would like to see the string towards the end you're talking about.

there are some glitches in the disassembly.

@tkchia: thank you, I was going to warn you about the fact that this the disassembler was written from scratch and has had very little outside testing. I will study your glitches and correct. Still not bad for 4k bytes! :)

Aug 13 '22 18:08 ghaerr

Hello @ghaerr,

can you post the binary boot block?

@Mellvik had provided a link: https://github.com/jbruchon/elks/files/9331781/SX-MBR.bin.gz . It goes without saying, but anyway: treat with extreme care. :astonished:

Still not bad for 4k bytes! :)

:slightly_smiling_face:

Thank you!

Aug 13 '22 18:08 tkchia

Hello @tkchia,

Wow, super job analyzing the SX malware boot sector! And thank you for the bug reports on disasm.

The instruction at %cs:0x00b1 alters the instruction %cs:0x0145 so that, by the time we reach it, it effectively becomes, well, 0145 ea 00 00 c0 07 ljmpw $0x07c0,$0x0000

This is most likely to stop virus protectors from directly recognizing an ljmpw with an 0x07c0 operand, I would assume.

This is after the MBR has copied itself to upper conventional memory and then reserved the memory for itself (at %cs:0x3d).

What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?

you may notice that the instructions at %cs:0x13e—%cs:0x0144 redirect the int 0x13 vector (at 0:0x4c) to point at the MBR's own code (!).

So effectively, this boot block allows a later program to execute code in a reserved block of memory, accessed via INT 13h? What does that code do, print "I am Li Xi"!?

Thank you!

Aug 13 '22 18:08 ghaerr

Hello @ghaerr,

What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?

0:0x0413 = 0x40:0x13 is the IBM BIOS variable that records the amount of base (conventional) memory available.

What does that code do, print "I am Li Xi"!?

It mainly intercepts BIOS disk I/O requests, and adds a few I/O requests of its own to write (and spread) itself onto disks.

Thank you!

Aug 13 '22 19:08 tkchia

Well, isn't that a catch! That's a great piece of work, @tkchia - much appreciated!

And yes, let's make sure we contain this one. Actually there is little danger of spreading, even between my systems, and the observation I had yesterday comes back: I might easily have ran fdisk /mbr on the drive and we'd missed the party. By now (as opposed to yesterday) I figure that might have been advantageous.

Anyway, now we know (with some probability) the reason for the boot screwup I occasionally get, reported yesterday. Virus on Elks. That's a new one indeed.

I'll certainly make a habit out of mbr'ing new (and old) cards from now on.

Finally, It seems to me that beyond wreaking havoc (random system problems) and attempting to spread itself, this virus doesn't do anyhing. So it ends up in the same class as the parity boot virus that I've seen a lot of these past few years - on old clunkers. @tkchia, do you agree.?

-M

aug. 2022 kl. 21:16 skrev tkchia @.***>:

Hello @ghaerr,

What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?

0:0x0413 = 0x40:0x13 is the IBM BIOS variable that records the amount of base (conventional) memory available.

What does that code do, print "I am Li Xi"!?

It mainly intercepts BIOS disk I/O requests, and adds a few I/O requests of its own to write (and spread) itself onto disks.

Thank you!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Aug 13 '22 21:08 Mellvik

And yep, this MBR is almost certainly malware. It is rather interesting piece of malware too (in my opinion). But perhaps there is some classical anti-virus program that actually knows about it. (Edit: this might be the same as the "AntiCMOS.B" virus that F-Secure is referring to: http://www.f-secure.com/v-descs/anticmos.shtml .)

Thanks again, @tkchia - that pretty much nails it. No question - this is it. AntiCMOS.B

And it may - in addition to explaining the apparent file system corruption we talked about, also explain some weird behaviour when configuring some interrupts on the machine (CMOS).

Booting DOS and doing a fdisk /MBR did the trick, and suddenly ELKS reports 2kb more available memory. Still not 640k, but close - 639k, the last K probably some DiskOnChip initialization code/data (if the DOC chip is installed, memtop drops by 29k).

In order to track down the source of the virus I installed the DOC that came with the system, which becomes the boot drive if the system is configured that way (and no device with higher priority is found). And sure enough, here's the virus, on the DOC boot sector. So this is not something that came with a CF card, but with the system!

--M

Aug 14 '22 11:08 Mellvik

Hello @Mellvik,

And sure enough, here's the virus, on the DOC boot sector. So this is not something that came with a CF card, but with the system!

Wow. Holy cow.

So it ends up in the same class as the parity boot virus that I've seen a lot of these past few years - on old clunkers. @tkchia, do you agree.?

Well, this AntiCMOS.B virus does seem to have one interesting property: instead of stashing away the original good MBR or floppy bootloader somewhere else on the boot medium, it actually tries to make itself be the bootloader, and take over the original bootloader's functions. (With less error checking, though.)

I have not completely analyzed the virus's floppy handling code, but I believe that it actually tries to parse the MS-DOS filesystem and load the first file in the root directory, and treat it as MS-DOS's io.sys.

Thank you!

Aug 14 '22 15:08 tkchia

Hello @Mellvik,

Booting DOS and doing a fdisk /MBR did the trick, and suddenly ELKS reports 2kb more available memory.

That is a bit strange, since ELKS doesn't use the BIOS Data Area (BDA) at 0x40:0013 to learn the amount of memory available. (It does use INT 12h, which the virus MBR could have attached to - I haven't yet had time to fully study its code).

Nonetheless, it is interesting to know that ELKS is susceptible to what is essentially a DOS virus - and that if the virus hooks INT 13h in order to write itself as another MBR, ELKS may also be able to host the virus' replication, since all ELKS disk I/O also uses INT 13h!

It's a case study of how ancient structures can revive ancient viruses - with ELKS being designed around MBR compatibility we leave ourselves open to some of the earliest viruses.

Thank you!

Aug 14 '22 18:08 ghaerr

Hi @Ghaerr,

Booting DOS and doing a fdisk /MBR did the trick, and suddenly ELKS reports 2kb more available memory.

That is a bit strange, since ELKS doesn't use the BIOS Data Area (BDA) at 0x40:0013 to learn the amount of memory available. (It does use INT 12h, which the virus MBR could have attached to - I haven't yet had time to fully study its code).

As it turns out these two are the same thing: INT 12 - BIOS - GET MEMORY SIZE Return: AX = kilobytes of contiguous memory starting at absolute address 00000h Note: this call returns the contents of the word at 0040h:0013h; in PC and XT, this value is set from the switches on the motherboard SeeAlso: INT 11"BIOS" http://fd.lod.bz/rbil/interrup/bios/11.html#sect-610,INT 2F/AX=4A06h http://fd.lod.bz/rbil/interrup/dos_kernel/2f4a06.html#sect-4865,INT 4C"Tandy 2000" http://fd.lod.bz/rbil/interrup/bios_vendor/4c.html#sect-6323,MEM 0040h:0013h http://fd.lod.bz/rbil/memory/bios/m00400013.html#sect-9584 [from http://fd.lod.bz/rbil/interrup/bios/12.html http://fd.lod.bz/rbil/interrup/bios/12.html ]

Nonetheless, it is interesting to know that ELKS is susceptible to what is essentially a DOS virus - and that if the virus hooks INT 13h in order to write itself as another MBR, ELKS may also be able to host the virus' replication, since all ELKS disk I/O also uses INT 13h!

It's a case study of how ancient structures can revive ancient viruses - with ELKS being designed around MBR compatibility we leave ourselves open to some of the earliest viruses.

Indeed. As long as we're using BIOS calls for I/O I guess we're susceptible. OTOH the fact that we're using floppy images a lot and have our own MBR and bootblocks provides a good immune system for ELKS. BTW - I was curious as to whether this one was able to prevent its own destruction (like writing a new MBR on an infected system): Booted the DOS partition and ran fdisk /mbr. It was not.

Now that we have this experience it's easier to think along those lines the next time something really weird is happening.

-M

Aug 15 '22 08:08 Mellvik

elks elks copied to clipboard

Buffer/fs bug

hd d0:6238#4

hd 2d0:6638#4

elks
elks copied to clipboard