elks
elks copied to clipboard
Buffer/fs bug
In short: Removing (rm
) a 'large' (as in several k bytes) cause the calling process to hang (cannot be interrupted). Shortly after, the system will hang, some times 'hard' (as in 'ctl-alt-del' not working).
This does NOT happen when XMS support is compiled in, whether enabled via /bootopts
or not. Presumably that's the reason we haven't seen this before.
elks17# cd /
elks17# ls
EXEC: '/bin/ls' env 134
OPEN '.' flags 0x0 = 4
CLOSE 4
bin bootopts dev etc fd linux
linux.buf linux.nobuf mnt mnt2 mnt3 mnt4
nlinux root tmp xx zz
elks17# touch x
BLK 11 read fc002180:3
EXEC: '/bin/touch' env 141
BLK 808 read b0002180:3
OPEN 'x' flags 0x241 = 4
CLOSE 4
elks17# rm x
EXEC: '/bin/rm' env 138
BLK 1325 read 6c002180:3
BLK 1326 read 2180:3
BLK 1327 read 40002180:3
BLK 1328 read 48002180:3
BLK 1329 read f0002180:3
BLK 1330 read 4c002180:3
UNLINK 'x'
elks17# ls -l zz
EXEC: '/bin/ls' env 144
OPEN '/etc/passwd' flags 0x0 = 4
BLK 1429 read e4002180:3
CLOSE 4
OPEN '/etc/group' flags 0x0 = 4
BLK 1422 read 50002180:3
CLOSE 4
-rw-r--r-- 1 root root 81472 Jul 12 10:46 zz
elks17# rm zz
EXEC: '/bin/rm' env 139
UNLINK 'zz' <---------- hang
Hello @Mellvik,
I saw this one years ago, and its on my list. The reason for the hang is that the MINIX filesystem in some cases requires more local L1 locked buffers than there are available (8 unless FAT linked in, then 12). The rm process hangs in the kernel waiting for another buffer to become free, and rm already has all the buffers locked, so deadlock.
It is somewhat hard to believe that the kernel needs more than 12K buffers to remove a file, but this happens when the inode is using a double-indirect block list, I think. We're very short on kernel data segment space, so merely increasing the L1 size isn't a good option.
This does NOT happen when XMS support is compiled in,
Interesting. Perhaps its an L2/L1 issue then, not just L1.
I'll continue to look into this.
Thank you!
Thank you @ghaerr.
It's strange I haven't run into this one before. After all, the XMS support is a fairly recent addition. Without XMS ELKS hangs every time I (say) 'rm linux'. Then again I may have a setting (this is new one, new machine etc) that just happen to make the bits in the puzzle fit exactly.
—M
- jul. 2022 kl. 19:05 skrev Gregory Haerr @.***>:
Hello @Mellvik https://github.com/Mellvik,
I saw this one years ago, and its on my list. The reason for the hang is that the MINIX filesystem in some cases requires more local L1 locked buffers than there are available (8 unless FAT linked in, then 12). The rm process hangs in the kernel waiting for another buffer to become free, and rm already has all the buffers locked, so deadlock.
It is somewhat hard to believe that the kernel needs more than 12K buffers to remove a file, but this happens when the inode is using a double-indirect block list, I think. We're very short on kernel data segment space, so merely increasing the L1 size isn't a good option.
This does NOT happen when XMS support is compiled in,
Interesting. Perhaps its an L2/L1 issue then, not just L1.
I'll continue to look into this.
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1182020992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOENET6UELJKTD7W6FLVTWQWLANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
Without XMS ELKS hangs every time I (say) 'rm linux'.
Oh, good to know we have a larger file available to delete. I'll use this for a test case and will up the priority.
After all, the XMS support is a fairly recent addition
Yes, I got a bit spoiled myself with a default .config with XMS and 2880k disk support. I'm back to running 1440k and no XMS, so I'll see whether this repeats over here.
Yes, I got a bit spoiled myself with a default .config with XMS and 2880k disk support. I'm back to running 1440k and no XMS, so I'll see whether this repeats over here.
Welcome to the club :-) . Well, I guess you're on QEMU so it's fast anyway… I got really spoiled recently when I started to run of of HD instead of floppy. He.., it's fast.
Crazy as it sounds, 'rm linux' is probably the command I use the most these days, building and transferring 50-100 kernels per day. :-)
BTW - I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.
-M
Without XMS ELKS hangs every time I (say) 'rm linux'. Oh, good to know we have a larger file available to delete. I'll use this for a test case and will up the priority.
I just tested with our "standard" configuration ibmpc-1440.config
(used for the auto Github builds), and rm linux
works great, no hang. Can you try using that configuration, or diffing it with your .config, to see what might be happening? It would be worthwhile to see whether QEMU hangs for you as well, perhaps this problem is not the same as I described.
I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.
Definitely interested to hear more about that.
I still have sys problems on the compaqs (MBR boot doesn't find partition boot), while the same works great on this other machine (40MHz 386SX). To be investigated.
Definitely interested to hear more about that.
OK, this mystery has been resolved if not fixed.
It turns out (news to me, not to you) that sys assumes the source fs is an image created by the build_hd_image.sh script so that the partition boot block (aka floppy boot block) is present at a predictable place. Which is what happened on my newer system. But the old system wasn't created that way, it evolved over time, all 3 Minix FSs being created manually, sys used only recently. IOW there never was a valid boot block to copy.
What I suggest, in order to make this more failsafe, is that either minix mkfs or sys copiy the bootblock explicitly (from file).
-M
sys assumes the source fs is an image created by the build_hd_image.sh script
Wow - I've never seen that script before. It is extremely outdated and should probably be removed, as ELKS now creates all HD images (hd32mbr-minix.img and hd32-minix.img) using Makefiles under the images/ directory, and doesn't require loopback filesystems to build. Of course, these "standard" HD images are really only testing, as few people have a 32MB HD!
so that the partition boot block (aka floppy boot block) is present at a predictable place. IOW there never was a valid boot block to copy.
Which boot block exactly are you talking about here - the MBR, or the boot block which is the first sector of every partition? I don't quite yet understand why the Compaq is failing on sys.
What I suggest, in order to make this more failsafe, is that either minix mkfs or sys copiy the bootblock explicitly (from file).
I've thought about adding another option that allows copying a boot sector as a seperate file, but sys (or makeboot) doesn't have that file available to it on a standard boot FD or HD system. It could be possible to compile-in a boot sector into makeboot. Once I fully understand what is happening on your Compaq, we can work to come up with a better solution.
Back to the rm
hang problem:
This is repeatable only on one particular machine, so I'm going after it in that setting. I was hoping it had to do with the memcmp
problem just reported, it wasn't.
-M
Back to the rm hang problem This is repeatable only on one particular machine, so I'm going after it in that setting
If this only repeats with the same compiled kernel on a different machine, then that's very strange - suspect possible memory problem? Does that machine have different amount of main memory (assume no XMS)?
Otherwise, I suspect may be able to reproduce with different CONFIG_FS_NR_EXT_BUFFERS=
setting, as I have seen this problem before. Also possibly try recompiling with standard ibmpc-1440.config file for failing machine.
Thank you @ghaerr -
Back to the rm hang problem This is repeatable only on one particular machine, so I'm going after it in that setting
If this only repeats with the same compiled kernel on a different machine, then that's very strange - suspect possible memory problem? Does that machine have different amount of main memory (assume no XMS)?
Yes, it still happens only if XMS is not compiled in. And yes, the main memory is odd sized, 607K as reported by ELKS. The machine has 4M total, and the reason for the odd main memory size is the DiskOnChip, which is an interesting story by itself, for another time. I can actually remove the chip and increase main memory to 637k - I'll do that just for the heck of it.
Otherwise, I suspect may be able to reproduce with different CONFIG_FS_NR_EXT_BUFFERS= setting, as I have seen this problem before. Also possibly try recompiling with standard ibmpc-1440.config file for failing machine.
Thanks, will check out that too! It would be good to figure this one out now that I have a repeatable setting.
—M
Update: This turns out to be a touch cookie indeed.
Given that the problem appears on this particular hardware only, not on QEMU, not on the compaqs, it would be reasonable to guess 'race condition'. However, since the problem goes (reliably, predictably) away when adding the right debug printk
s in the code, that seems unlikely. Also, if those printk
s don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printk
s can be in any of a number of routines and will have slightly different effects but always hang the process or the system. And frequently destroy the root file system, making it a very slow process.
More later.
-M
Also, if those printks don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printks can be in any of a number of routines and will have slightly different effects but always hang the process or the system.
Agreed. Now that you describe this new behavior, I have seen this before, and the kernel stack was being corrupted due to some strange behavior with the LED lights and keyboard controller code. I can't remember the exact issue, I'll look it back up, but it was finally identified and fixed by @tkchia. Does this particular system have anything unusual about the keyboard or its controller? Does the issue occur when no Caps Lock or other LED keys have ever been pressed?
Also, if those printks don't have arguments, just text, they have no effect. So we have a stack pollution issue, a very elusive one: The printks can be in any of a number of routines and will have slightly different effects but always hang the process or the system.
Agreed. Now that you describe this new behavior, I have seen this before, and the kernel stack was being corrupted due to some strange behavior with the LED lights and keyboard controller code. I can't remember the exact issue, I'll look it back up, but it was finally identified and fixed by @tkchia https://github.com/tkchia. Does this particular system have anything unusual about the keyboard or its controller? Does the issue occur when no Caps Lock or other LED keys have ever been pressed?
Yes, the keyboard is hardly ever touched (not quite true since it comes to rescue every time the FS has been corrupted, but still it's rare). I ran into an interesting case this morning in which one printk is just ignored, the next one - 2 lines down, prints. This is in minix/truncate.c which is one of the suspects.
Are any of the 'stack protection' options to gcc included in our toolchain btw? And what about the register declarations, are the ignored or actually used? I noticed gcc complains if you try to get the address of a register variable, which makes sense but may not be a reliable signal...
-M
Reviewing this issue a bit, I am now guessing that the kernel corruption, now more firmly evidenced by printk's not working (which occurred on the last similar issue), is ultimately happening to scribble on EXT memory buffers, which is causing the root file system corruption. It doesn't occur on XMS likely because the memory buffers are in non-addressable XMS memory.
Although not a for sure, we likely need to continue to look at the potential hardware incompatibility on your one system as the base cause, since none of this is (yet) repeatable on any other system. Thus the keyboard controller mention, which was the cause of the last similar problem. Any other dissimilarities between the failing system and your other PCs would be good to know, including whether the problem persists without any NIC or serial cards enabled/inserted (as well as possibly memory configuration changes).
Are any of the 'stack protection' options to gcc included in our toolchain btw?
Not really, I think there is a stack check option that emits a call to a stack checker that could be turned on, but ELKS itself checks the kernel stack pointer on every clock interrupt and system call entry, as well as checking the SS register for validity.
And what about the register declarations, are the ignored or actually used? I noticed gcc complains if you try to get the address of a register variable
I'm pretty sure register declarations are ignored with regards to forcing any register usage, but they're not entirely ignored as I've also seen the address of register variable error. I doubt we're dealing with a compiler issue. I will research the last issue a bit more and report back.
Thank you @ghaerr -
Although not a for sure, we likely need to continue to look at the potential hardware incompatibility on your one system as the base cause, since none of this is (yet) repeatable on any other system. Thus the keyboard controller mention, which was the cause of the last similar problem. Any other dissimilarities between the failing system and your other PCs would be good to know, including whether the problem persists without any NIC or serial cards enabled/inserted (as well as possibly memory configuration changes).
Agreed, that may well be the case. The ALI 6117C is a 386sx based SOC with most components on chip - interrupt controllers, timer/counter, DMA controllers, memory mapper, keyboard/mouse i/f (!) and much more. As mentioned in https://github.com/jbruchon/elks/issues/1303#issuecomment-1192324726 https://github.com/jbruchon/elks/issues/1303#issuecomment-1192324726 I've had IRQ5 problems with the system. Also mentioned before (this thread) is the main memory mapping, creating some odd main memory sizes (such as 607K available). This has been tested for the case in question and found to not influence the situation.
I doubt we're dealing with a compiler issue. I will research the last issue a bit more and report back.
Agreed - and thank you!
—M
Hello @Mellvik,
I have reviewed the previous issues relating to the ELKS kernel crashing, or acting strangely, and haven't yet been able to correlate to this problem. In order to eliminate the potential IRQ 1/keyboard controller as a culprit, try configuring for BIOS console instead of Direct console for this machine. The BIOS console doesn't interact directly with IRQ 1 nor the LEDs.
Will the sequence on your initial post always cause this issue? If so, that's good: the commands could be put in a shell script and executed immediately after boot, to more rapidly eliminate variables. I would suggest that this be done without an serial or NIC cards in the system as well.
Thank you!
Thank you @ghaerr.
I have reviewed the previous issues relating to the ELKS kernel crashing, or acting strangely, and haven't yet been able to correlate to this problem. In order to eliminate the potential IRQ 1/keyboard controller as a culprit, try configuring for BIOS console instead of Direct console for this machine. The BIOS console doesn't interact directly with IRQ 1 nor the LEDs.
Yes, I will check that one in the morning. Agreed, eliminating possible cases is the only way forward.
Will the sequence on your initial post always cause this issue? If so, that's good: the commands could be put in a shell script and executed immediately after boot, to more rapidly eliminate variables. I would suggest that this be done without an serial or NIC cards in the system as well.
I've been making some progress this past week. The problem is entirely repeatable, slight changes in the code change behaviour. I've got it to fail on the compaq, not with rm but mount, and mount also fails (now) on the SX. So it's not just unlink only anymore. Also, open with truncate will fail like rm, so truncate is still in the crosshairs. Moving a delay loop around in the code has turned out to be useful - and a race condition is still a possibility, although I have a hard time seeing how any of the events I'm looking at can be interrupted. If they were, my printk's would show it.
To be continued...
-M
The actual issue itself seems to be evolving... we know there's a kernel problem where the system can appear to hang due to not enough buffers. This manifests itself by sleeping the process in a deadlock, but the system itself is fine. That is, logging in from another terminal is possible.
With regards to this only happening on one system, you're now saying it's happening on two others? I would assume that this is likely because the problem wasn't iterated enough times on the newer systems, but shows up once in a while? And that the original system (with a different memory configuration) always hangs immediately?
Then there's the additional reported issue of kernel printk's not working... not sure how that fits in with the rest of the reported problems. Do printk's always not work, or only on the original problem system?
It appears the unlink system call calls the kernel rescheduler (possibly to allow other processes to release buffers). This means that the answer to your comment above about how events can be interrupted is that, if any other process is running (particularly ktcp, as it has select timeouts), the process can definitely get interrupted and switched out. Thus, all testing should be done on a bare bones system with as few other processes running as possible (suggest single-user mode, or init=/bin/sh in /bootopts).
I'm looking a bit at unlink versus ftruncate, and they're a bit complicated. So it would be nice to further narrow down all the reported issues before chasing too deep. You can answer these questions in your next report.
Thank you!
The actual issue itself seems to be evolving...
It is - with the addition and movement of printk's.
we know there's a kernel problem where the system can appear to hang due to not enough buffers. This manifests itself by sleeping the process in a deadlock, but the system itself is fine. That is, logging in from another terminal is possible.
I don't think this is what we're looking at since the problem is immediate after boot, so plenty of buffers. The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd. As to the type of hang, it varies - from silent hard hang via kernel messages about kernel stack corruption to process hangs and everything else working (and other variants) depending on the current printk's. Classic stack corruption indicators.
With regards to this only happening on one system, you're now saying it's happening on two others?
Just one other - the 3rd is in hiatus to leave room for what I now call the ALI system, which is a single board computer the size of a NIC .
I would assume that this is likely because the problem wasn't iterated enough times on the newer systems, but shows up once in a while?
I'm testing a kernel on the compaq only occasionally,
And that the original system (with a different memory configuration) always hangs immediately?
With the printk settings I have now, both fail immediately (if differently) given the right commands.
Then there's the additional reported issue of kernel printk's not working... not sure how that fits in with the rest of the reported problems. Do printk's always not work, or only on the original problem system?
That's 106 builds ago, but can be reproduced any time.
It appears the unlink system call calls the kernel rescheduler (possibly to allow other processes to release buffers). This means that the answer to your comment above about how events can be interrupted is that, if any other process is running (particularly ktcp, as it has select timeouts), the process can definitely get interrupted and switched out. Thus, all testing should be done on a bare bones system with as few other processes running as possible (suggest single-user mode, or init=/bin/sh in /bootopts).
True, but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.
More later.
M
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd
Three mounts is a lot of activity - introduces variables in perhaps FAT and MINIX filesystems. Recommend leaving all auto-mounts off for presentation of bug reproduction. If this only reproduces when FAT mounts are made, for instance, we could be getting kernel stack overflow... we saw this before on @tyama501's machine where BIOS disk I/O used tons of stack, and increasing KSTACK_BYTES by 100 in <limits.h> fixed the problem. You can also turn on CONFIG_STRACE on the failing system and it will display every system call and its associated kernel stack used.
but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.
Yes - my comment was from looking at the kernel source for unlink: it explicitly calls schedule
, which will reschedule the current process immediately:
int minix_unlink(register struct inode *dir, char *name, size_t len)
{
int retval;
register struct inode *inode;
struct buffer_head *bh;
struct minix_dir_entry *de;
goto init_loop;
do {
iput(inode);
unmap_brelse(bh);
schedule(); <--- not sure why, but here it is
init_loop:
retval = -ENOENT;
inode = NULL;
bh = minix_find_entry(dir, name, len, &de);
if (!bh) goto end_unlink2;
if (!(inode = iget(dir->i_sb, (ino_t) de->inode))) goto end_unlink1;
retval = -EPERM;
if (S_ISDIR(inode->i_mode)) goto end_unlink;
} while (de->inode != inode->i_ino);
Just trying to help get your problem debugged. We need a reproduction with the barest boot conditions possible, no extra processes running, no auto-mounts, no extra cards inserted, if possible. If we can reproduce under those conditions, it will be a lot faster when I have to go searching the kernel code for problems...
I suppose the good news is that you're finding this on other systems, which means we should hopefully get a reproduction on QEMU at some point, which will help me debug it, since I have little access to real hardware.
Thank you!
Thank you @ghaerr, Being forced to describe the issues and observations is helpful in itself, and discussing them even more so. This rabbit hole turned out to be deep (again) - and no rabbit ...
The compaq actually hangs during boot because it's set up to mount 4 filesystems and stops in the 3rd
Three mounts is a lot of activity - introduces variables in perhaps FAT and MINIX filesystems.
Agreed. AAMF I was elated to see the problem on a second machine, so I kept the setup while verifying that mount triggers the problem on the first system too - in certain circumstances.
Recommend leaving all auto-mounts off for presentation of bug reproduction. If this only reproduces when FAT mounts are made, for instance, we could be getting kernel stack overflow... we saw this before on @tyama501's machine where BIOS disk I/O used tons of stack, and increasing KSTACK_BYTES by 100 in <limits.h> fixed the problem.
That's one I haven't tried yet, will check that.
You can also turn on CONFIG_STRACE on the failing system and it will display every system call and its associated kernel stack used.
Good idea, I've never used this one - being sceptical about its effect on timing and performance. No is the time to change that.
but the scheduler gets called only if there is a lack of resources or contention. There is not AFAIK. Then again I may be missing something.
Yes - my comment was from looking at the kernel source for unlink: it explicitly calls schedule, which will reschedule the current process immediately:
int minix_unlink(register struct inode *dir, char *name, size_t len) {
int retval; register struct inode *inode; struct buffer_head *bh; struct minix_dir_entry *de;goto init_loop; do { iput(inode); unmap_brelse(bh); schedule(); <--- not sure why, but here it is
init_loop: retval = -ENOENT; inode = NULL; bh = minix_find_entry(dir, name, len, &de); if (!bh) goto end_unlink2; if (!(inode = iget(dir->i_sb, (ino_t) de->inode))) goto end_unlink1; retval = -EPERM; if (S_ISDIR(inode->i_mode)) goto end_unlink; } while (de->inode != inode->i_ino); The thing is - the goto before the loop (this code is full of them, seems like a Torvalds speciality) ensures that schedule() is skipped in 'normal' cases. I believe the schedule call is there because another process may be waiting for the buffer just released (2nd pass or later). Just trying to help get your problem debugged. We need a reproduction with the barest boot conditions possible, no extra processes running, no auto-mounts, no extra cards inserted, if possible. If we can reproduce under those conditions, it will be a lot faster when I have to go searching the kernel code for problems...
Agreed and all feedback/dialog is appreciated. I've been using QEMU for comparing behaviour and in tests for reproducibility. No luck in that dept so far, not even when taking the QEMU performance setting down to 10 or 1%.
I suppose the good news is that you're finding this on other systems, which means we should hopefully get a reproduction on QEMU at some point, which will help me debug it, since I have little access to real hardware.
It's indeed good. Further, the fact that I'm seeing it in situations other than truncate() is good or bad depending on perspective. In any case its part of the digging. The rabbit is still in the wind.
-M
I haven't tried personally running ELKS in pcem or 86box, but they both provide more accurate cpu timing or even hdd timing (in 86box git) emulation ..
Thank you Andrew,
I didn't realize 86box can run on MacOs. It does and it even has a setting for emulating the ALI processor I'm using.
Very encouraging.
—M
- jul. 2022 kl. 00:47 skrev Andrew Randrianasulu @.***>:
I haven't tried personally running ELKS in pcem or 86box, but they both provide more accurate cpu timing or even hdd timing (in 86box git) emulation ..
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1196055227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOAXGTFICMVL7JB3MMDVWBTJPANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
@ghaerr, the buffer bug is still elusive and progress so far is more eliminating what isn't the problem than getting really closer to it. A couple of possible bugs but no cigar.
While chasing down the differences between cases that work (XMS buffers) and cases that don't (ext buffers), here is some output I was hoping could provide a clue. The first sample (ignore the extra stuff, it's from printks
indicating where we're at) is the ext64 case which fails in an endless loop in truncate_direct (td0). The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw
(in map_buffer
, fs/buffer.c
). They are very different between the two cases - which is expected, but is there anything that stands out to you?
The file being unlinked is named 'pp', a copy of /bootopts .... < 500 bytes.
EXEC: '/bin/rm' env 139
REMAP: 14
REMAP: 19
REMAP: 13
BLK 1593 read 400023c0:3
BLK 1594 read 23c0:3
BLK 1597 read 600023c0:3
BLK 1598 read 580023c0:3
BLK 1599 read 3c0023c0:3
BLK 1600 read 5c0023c0:3
REMAP: 14
BLK 20 read 440023c0:3
map: 17 try 8
UNMAP: 25 <- 8:6400:23c0:8826:13a0
REMAP: 14
UNLINK 'pp'
REMAP: 14
trEtd0map: 5 try 9
UNMAP: 18 <- 9:4800:23c0:8c26:13a0
td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0td0 etc. forever.
The second case is with XMS buffers active and works fine:
EXEC: '/bin/rm' env 139
REMAP: 14
REMAP: 19
REMAP: 13
BLK 1593 read 100000:4000
BLK 1594 read 100000:0
BLK 1597 read 100000:6000
BLK 1598 read 100000:5800
BLK 1599 read 100000:3c00
BLK 1600 read 100000:5c00
REMAP: 14
BLK 20 read 100000:4400
map: 17 try 8
UNMAP: 25 <- 8:6400:0:10:8876
REMAP: 14
UNLINK 'pp'
REMAP: 14
trEtd0map: 5 try 9
UNMAP: 18 <- 9:4800:0:10:8c76
td0td0td0td0td0td0trDtrItrXmfi:mb;map: 2 try 10
UNMAP: 19 <- 10:4c00:0:10:9076
Any input would be welcome.
Thank you -M
Hello @Mellvik,
Actually these two logs look remarkably similar - so much so that the block reads and buffer map/unmaps are identical, right until the moment where the system hangs (or rather, loops indefinitely in truncate, it seems).
I haven't had time to properly analyze MINIX truncate, to see 1) why it loops with a possible reschedule in the first place, and 2) how it might be hanging. However, in general ,the L1 buffer map/unmap is called (using map_buffer
and unmap_buffer
or unmap_brelse
), to grab a disk buffer (in this case, looks like one of the disk mapping blocks for the inode being truncated), and then copies that external or XMS buffer into a local (DS-addressable) L1 buffer. The map_buffer
routine itself, since we're limited in the kernel to only 12 or so of them) will sleep the process if there aren't enough free buffers, until unmap_buffer frees a buffer. So, in general, if any kernel process requires more than 12 buffers, and only a single process is using them, the system will hang.
I don't understand truncate well enough to see why it might need > 12 buffers: certainly there's buffers needed for each open directory in the namei
directory filename processing routines, then another one for the inode itself, then probably one for each of the inode indirect block buffers, etc. But with a < 512 byte file, there's no indirect blocks, since MINIX keeps the first seven in the inode itself.
Also, there is no difference in the number of L1 buffers when running EXT vs XMS external buffers. It is possible, for some reason, that we have a kernel bug where an external buffer is not being released properly. You might test this be setting the EXT buffers to 128, instead of 64, and see whether that changes anything. I think ELKS will run with even more EXT buffers, for testing (all will be in main memory).
Finally, there may be a bug in truncate itself, where for some reason it loops thinking that it needs more L1 (mapped) buffers than it actually needs, and this is causing the busy loop.
The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw (in map_buffer, fs/buffer.c).
Instead of displaying the buffer addresses, suggest putting more printks in truncate, especially around the buffers it seems to want. The buffer block number can be printed using bh->b_blocknr, to better understand exactly which inode block truncate is looking at.
I assume there are NO other running processes on the system (no network, etc), right? This will matter with the rescheduling, which I discussed previously. If you have various filesystems mounted, it is possible that could be a problem also, as they could possibly be using an L1 buffer for each of them. It is entirely possible we are just running out of L1 buffers due to too many kernel resources being used (like active mounts, for instance). (If that's the case, and the problem goes away without the 4 mounted filesystems, then we can possibly correct that by having the filesystem re-grab the superblock for each, rather than keeping them mapped into L1). I haven't actually checked that they are locked in L1 space, just a guess.
I'll look more into this tomorrow.
Thank you!
Thank you @ghaerr - it seems I need a new angle on this one - including clearing out xms_fmemcpyw as a potential suspect.
Actually these two logs look remarkably similar - so much so that the block reads and buffer map/unmaps are identical, right until the moment where the system hangs (or rather, loops indefinitely in truncate, it seems).
Yes, the elusiveness of this problem is frustrating. It still seems that the problem is stack pollution (see below) so the symptoms appear in other places than the problem.
I haven't had time to properly analyze MINIX truncate, to see 1) why it loops with a possible reschedule in the first place, and 2) how it might be hanging. However, in general ,the L1 buffer map/unmap is called (using map_buffer and unmap_buffer or unmap_brelse), to grab a disk buffer (in this case, looks like one of the disk mapping blocks for the inode being truncated), and then copies that external or XMS buffer into a local (DS-addressable) L1 buffer. The map_buffer routine itself, since we're limited in the kernel to only 12 or so of them) will sleep the process if there aren't enough free buffers, until unmap_buffer frees a buffer. So, in general, if any kernel process requires more than 12 buffers, and only a single process is using them, the system will hang.
I don't understand truncate well enough to see why it might need > 12 buffers: certainly there's buffers needed for each open directory in the namei directory filename processing routines, then another one for the inode itself, then probably one for each of the inode indirect block buffers, etc. But with a < 512 byte file, there's no indirect blocks, since MINIX keeps the first seven in the inode itself.
I've spent what feels like aeons in truncate by now and while it seems to be doing some strange things (and missing a cast to int in V1_trunc_direct, for the DIRECT_BLOCK macro, which I at one point figured was the problem), truncate is in the clear (btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now). The thing with truncate is that it turns out to be the place where the symptoms are easiest to trigger and reproduce, but its not the only place (mount is another, not as predictable, see below). V1_trunc_direct can fail at any point in the code depending on where the printks are placed. It can also be made to work reliably if the printks are place right - and some of the printks will be mysteriously skipped. BTW - while indirect and double indirect blocks are not involved (small file), they are always called anyway.
Also, there is no difference in the number of L1 buffers when running EXT vs XMS external buffers. It is possible, for some reason, that we have a kernel bug where an external buffer is not being released properly. You might test this be setting the EXT buffers to 128, instead of 64, and see whether that changes anything. I think ELKS will run with even more EXT buffers, for testing (all will be in main memory).
I will check that. I've been using 32,64 and 80 EXT buffers so far, no difference.
Finally, there may be a bug in truncate itself, where for some reason it loops thinking that it needs more L1 (mapped) buffers than it actually needs, and this is causing the busy loop.
The numbers after the colon on the UNMAP: line are the parameters to xms_fmemcpyw (in map_buffer, fs/buffer.c).
Instead of displaying the buffer addresses, suggest putting more printks in truncate, especially around the buffers it seems to want. The buffer block number can be printed using bh->b_blocknr, to better understand exactly which inode block truncate is looking at.
See above. Actually, buffer.c emits pretty much the same symptoms. Enabling the debug() statements in the code, and the problem goes away, then disabling/enabling individual statements change behaviour, even if the changes are made in parts of the code that isn't being executed.
I assume there are NO other running processes on the system (no network, etc), right? This will matter with the rescheduling, which I discussed previously. If you have various filesystems mounted, it is possible that could be a problem also, as they could possibly be using an L1 buffer for each of them. It is entirely possible we are just running out of L1 buffers due to too many kernel resources being used (like active mounts, for instance). (If that's the case, and the problem goes away without the 4 mounted filesystems, then we can possibly correct that by having the filesystem re-grab the superblock for each, rather than keeping them mapped into L1). I haven't actually checked that they are locked in L1 space, just a guess.
The system is idle and other processes don't affect the problem. I have networking running most of the time, stopping the network and all gettys/logins don't change anything. Neither do the mounts, but the mount operation will sometimes hang the system. I haven't been able to make this one entirely predictable.- At first it seemed like the 3rd mount would always hang, then it was a specific mount (i.e. a file system), but the path is an unreliable (and slow) source because the crashes frequently screw up either the file system or the bootblock or both. BTW I have an updated version of make boot that helps - checks for the magic number at the end of the source bootblock and copies only the bootblock if requested. I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes. Also, a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?
I keep looking for local (stack allocated) pointers begin accidentally used as variables (frequent casting make them hard to catch for the compiler) and long/int mixups.
Thanks for looking into this.
—M
Hi @Mellvik,
I'm getting confused again on this issue, we might possibly be talking about different things - is the problem that rm
hangs, or something else? My long response above is only to do with why a hang may be occurring, nothing else.
the mount operation will sometimes hang the system.
You're saying running mount
hangs the system? Is this a separate problem, or to do with the issue of rm hanging?
Are all these duplicating regularly only on the one "bad" machine you have?
I understand you're saying that there is a separate problem relating to adding printk's that result in displaying garbage. When that happens, can you run that version on QEMU with the same result, or does this also only happen on the "bad" system?
a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?
Correct.
I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes.
From your log above - we are most definitely running out of L1 buffers, that is the reason truncate is looping. I explained that map_buffer will sleep the process, and some other process is issuing wakeup. This is the reason why we need to eliminate ALL other processes to track this down. (That is, if the problem is that of rm hanging, versus the other 3 problems mentioned in this issue).
Thank you @ghaerr -
yes, there is plenty of room for confusion and I've been at this one for so long that a lot of the parameters are easily taken for granted. So a few things (my perspective):
- There is only one problem - stack corruption
- rm hang is a symptom, a consequence of the above. There seem to be no problems in truncate or unlink.
- the printks demonstrate this: Their presence, content (# of parameters) and placement change the layout of the stack in (say) truncate and thus the behaviour.
- this can be repeated in buffer.c too. Leaving truncate out of the equation (i.e. no debug printk's in truncate routines), instead adding DEBUG to buffer.c and rm works just fine. Until we remove a couple of the debug statements - and we have a new variant of symptoms.
- IOW, the nature of rm's failure varies with the placement of the debug statements: Some times small files can be deleted, some times all files can be deleted, some times the process loops, some times the kernel hangs silently, some times it panics (say, in brelse) - or like this:
elks17# rm /linux; mv linux /
panic: Process -31913 exceeded kernel stack limit! magic c483
SYSTEM HALTED - Press CTRL-ALT-DEL to reboot:
- this behaviour is on the SX (ALI) system only, and goes away when XMS buffers are used.
- The mount problem has (AFAICT) the same origin: A result of the stack corruption occurring if the 'right' debug statements are in the 'right' places.
- The mount problem is not affected by XMS buffers, it's always the same and reproducible on the Compaq too, but not on QEMU.
- Like with rm, the nature of the mount problem changes with the amount of debug stuff active - and placement/params etc. Initially, mount would hang predictably when mounting (say) /dev/hda3 but not /dev/hda2 (both minix file systems). Some changes later it will not hang at all when mounting any file system, but hang when running the command without arguments to show available space - iff a FAT file systems is mounted.
At one point I was thinking the reason might be related to A20, memory mapping and the 1M wraparound 'mechanism' - since it seems to be somewhat hw/system dependent. But since ELKS is not using the HMA, that cannot be it. I did notice though, that while both the compaq and the SX, when running with XMS buffers, report 'A20 was off, now on' while QEMU reports 'A20 was on, now on'. A curiosity - and I haven't pulled out the 286 to get a 3rd platform to test on - yet.
So, having chased the problem through every statement the truncate routines and seen some really head scratching behaviour, I'm back to buffer.c and the buffer management routines - not really expecting to find the problem, but to find a pointer to it.
Thanks for clarifying the L1 mechanism. You're right - wait/schedule may indeed have to do with it.
BTW - truncate is looping only when its stack is corrupted. When working normally, V1_trunc_direct loops through all (7) blocks of the zone even when the file size is smaller than the first block, then exits. Room for improvement, but then again, unlink or truncate performance was never an issue.
—M
- aug. 2022 kl. 17:02 skrev Gregory Haerr @.***>:
Hi @Mellvik https://github.com/Mellvik,
I'm getting confused again on this issue, we might possibly be talking about different things - is the problem that rm hangs, or something else? My long response above is only to do with why a hang may be occurring, nothing else.
the mount operation will sometimes hang the system.
You're saying running mount hangs the system? Is this a separate problem, or to do with the issue of rm hanging?
Are all these duplicating regularly only on the one "bad" machine you have?
I understand you're saying that there is a separate problem relating to adding printk's that result in displaying garbage. When that happens, can you run that version on QEMU with the same result, or does this also only happen on the "bad" system?
a potential shortage of L1 buffers should not be affected by the choice of XMS or EXT L2 buffers, right?
Correct.
I don't think we're running out of L1 buffers since the system is idle most of the time and sometimes running only 5 processes.
From your log above - we are most definitely running out of L1 buffers, that is the reason truncate is looping. I explained that map_buffer will sleep the process, and some other process is issuing wakeup. This is the reason why we need to eliminate ALL other processes to track this down. (That is, if the problem is that of rm hanging, versus the other 3 problems mentioned in this issue).
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1202764132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOGXSQN26BH3RXCSOHDVXEZ6RANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
So basically, all of these issues (except mount) occur only on the SX/ALI system, right?
If so, we need to consider BIOS or hardware issues (including possible I/O delays only seen on that computer). It is possible that the SX BIOS is using too much stack, or that the stack is being trashed, but that this is seen only under the following circumstances:
Because truncate will call reschedule, and that L1 buffers can cause a hang (remember, I can duplicate the rm hang issue on QEMU), then what possibly is happening is reschedule is trashing the stack somehow. The scenario we are discussing is one of the few cases coded in the kernel where a reschedule is coded, that may return directly to the calling application, rather than sleeping which is the result in almost all other reschedules. The reason for this on the SX system could be the speed of its I/O (rather than a BIOS stack issue), that just happens to cause a reschedule() bug to show.
To test more along these lines, perhaps include more (or less) schedule() calls in various places. Technically, there should be no issue calling schedule() from the kernel at any point, but perhaps that is the problem here: schedule will switch stacks, then switch back, corrupted. All other applications end up calling wait() which will cause a call to schedule AFTER that process tries to return from the system call. See what I mean?
Another possibility would be to comment out the schedule() call and see if that changes anything. Worst case, we might end up with a busy-loop or hang problem, but there would not be any stack corruption, and printk's should work.
Thank you @ghaerr - That's a couple of very interesting angles! I didn't realize that schedule() would switch stacks in a case like this, thanks - that's very useful.
And I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?
So basically, all of these issues (except mount) occur only on the SX/ALI system, right?
If so, we need to consider BIOS or hardware issues (including possible I/O delays only seen on that computer). It is possible that the SX BIOS is using too much stack, or that the stack is being trashed, but that this is seen only under the following circumstances:
Because truncate will call reschedule, and that L1 buffers can cause a hang (remember, I can duplicate the rm hang issue on QEMU), then what possibly is happening is reschedule is trashing the stack somehow. The scenario we are discussing is one of the few cases coded in the kernel where a reschedule is coded, that may return directly to the calling application, rather than sleeping which is the result in almost all other reschedules. The reason for this on the SX system could be the speed of its I/O (rather than a BIOS stack issue), that just happens to cause a reschedule() bug to show.
To test more along these lines, perhaps include more (or less) schedule() calls in various places. Technically, there should be no issue calling schedule() from the kernel at any point, but perhaps that is the problem here: schedule will switch stacks, then switch back, corrupted. All other applications end up calling wait() which will cause a call to schedule AFTER that process tries to return from the system call. See what I mean?
Another possibility would be to comment out the schedule() call and see if that changes anything. Worst case, we might end up with a busy-loop or hang problem, but there would not be any stack corruption, and printk's should work.
Yes, this is a fresh start, I'll start off with the latter and take it from there. BTW: the speed is indeed very different between the SX and the compaq. The SX feels an order of magnitude or more faster, almost like QEMU except when booting from floppy.
Thanks.
--M
panic: Process -31913 exceeded kernel stack limit! magic c483
This message could mean old-fashioned kernel stack overflow.
Lets increase the kernel stack size, just in case the SX system BIOS happens to be using a lot. Change KSTACK_BYTES in include/linuxmt/limits.h from 640 to 740, and lets see what happens.
I've been tracing through the truncate code, and its quite complicated, with all sorts of routines called, many of which grab more L1 buffers.
(btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now).
Yes, inode->i_size is set to 0 by iput
which indicates that the kernel wants the size truncated to 0 (this is the path taken when minix_unlink is called, after decrementing the nlink count).
However, minix_truncate can also be (indirectly) called through the iop->truncate function pointer. This occurs in a couple places, in the truncate
system call, where the i_size value is set to the end result truncated size requested, and also on the open
system call when O_TRUNC is specified (which happens during creat
). Thus, the DIRECT_BLOCK macro is correct in its use.
I didn't realize that schedule() would switch stacks in a case like this, thanks - that's very useful.
It may not switch stacks, depends on what else is running. tswtch
is the function that actually switches stacks, called from within schedule().
I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?
None of us can get a "basic" (removing a single file) rm hang on QEMU, but I have an old TODO list on a more complex rm -rf
hang which, because it uses up all the L1 buffers, and there are no other processes running, will produce a hang every time in QEMU.
panic: Process -31913 exceeded kernel stack limit! magic c483
This message could mean old-fashioned kernel stack overflow.
Lets increase the kernel stack size, just in case the SX system BIOS happens to be using a lot. Change KSTACK_BYTES in include/linuxmt/limits.h from 640 to 740, and lets see what happens.
Good idea. I don't think we really have a stack problem since this happens only when there is an erroneous loop (presumably) eating the stack, and it's good to be able to rule it in or out.
I've been tracing through the truncate code, and its quite complicated, with all sorts of routines called, many of which grab more L1 buffers.
Indeed it is. And this may be a good time to take a close look for possible improvements - not the least reducing the pressure on L1 buffers. After all, when deleting, the only need for buffers is for the inodes - and for small files in particluar ... etc etc. Let's have a close look after we get the stack issue resolved.
Thank you.
-M
(btw truncate is called from only one place in the code and '__u32 i_size' is always 0, so the DIRECT_BLOCK macro is superfluous for now).
Yes, inode->i_size is set to 0 by iput which indicates that the kernel wants the size truncated to 0 (this is the path taken when minix_unlink is called, after decrementing the nlink count).
However, minix_truncate can also be (indirectly) called through the iop->truncate function pointer. This occurs in a couple places, in the truncate system call, where the i_size value is set to the end result truncated size requested, and also on the open system call when O_TRUNC is specified (which happens during creat). Thus, the DIRECT_BLOCK macro is correct in its use.
Aaaah, of course. Thanks.
Now, here's the thing with the DIRECT_BLOCK macro: it returns an __u32 - which in the subsequent 2 macros is being cast to int. However, in V1_trunc_direct() it's being used twice as if it were an int. This seems to me like a bug that will go unnoticed as long as we're deleting files (value 0), but kick in when truncating a file.
—M
I was under the impression that the rm hang situation on QEMU wasn't reproducible- is that right?
None of us can get a "basic" (removing a single file) rm hang on QEMU, but I have an old TODO list on a more complex rm -rf hang which, because it uses up all the L1 buffers, and there are no other processes running, will produce a hang every time in QEMU.
OK; that's something to look into while were at it. If this hangs in QEMU it should do so on the real thing too (somewhat depending on what else is going on obviously).
Talking about QEMU - I'm seeing interesting errors that are different from the ones on other platforms, but likely related, such as on the attached screenshot. This is from a freshly mounted (and fsck'd) HD file system, actually an identical copy of the one I'm using on the SX. It may be interesting to enable DEBUG_BLK and see what's going on. This is repeatable.
—M
data:image/s3,"s3://crabby-images/564f9/564f90b0a2d04ae3ee59240811ccd9f532c78a2c" alt="Skjermbilde 2022-08-03 kl 11 23 31"
Hello @Mellvik, can you (re)post the QEMU screenshot showing the associated error?
Now, here's the thing with the DIRECT_BLOCK macro: However, in V1_trunc_direct() it's being used twice as if it were an int
The macro, which converts a disk file size into a MINIX block number by shifting right 10 bits, will be automatically converted to an 'int' if stored, or, when compared to an int, the other operand will be automatically converted to a long. In the latter case, the gcc compiler is very good about generating code, and likely doesn't actually compare the upper 16 bits. I see your point that this macro could likely use an (int) cast, however. We can look at the code generated and see whether an (int) casting helps, after solving this issue. The reason it isn't buggy is that the max block number for any MINIX file will always fit in 16 bits (65K), which, when shift left back to a long, is the max filesystem size: 65M.
- aug. 2022 kl. 16:14 skrev Gregory Haerr @.***>:
Hello @Mellvik https://github.com/Mellvik, can you (re)post the QEMU screenshot showing the associated error?
I edited the post … let me know if it doesn't work.
—m
Hello @Mellvik,
I have duplicated the QEMU-staged "mnb: still zero bit!8192" bug on my system. I'm not sure if this is a relic of just having run out of disk space, or whether this is related to a problem keeping up with the MINIX allocated-block bitmap. I have noted this, and will treat this bug separately from this issue. I plan on looking further into it shortly. This is very likely not related to this open issue at all, but I'll test a bit more and report back.
In the meantime, definitely lets plan on continued testing as you've described above, separate fro this new issue, thanks!
Thank you @ghaerr.
Agreed - and as the mount commands show, there is ample space on the fs.
Also I agree it's likely a separate issue, it would be interesting to see if it goes away if XMS buffers are compiled in. I haven't checked that. When we find the bug this thread is focusing on, we'll know whether they are related.
Apropos - I've looked at calls to schedule() as a possible part in the scenario and realised I've been down that path before, after we talked about it a month or so ago. The thing is - and as far as I can tell from the tracings, schedule() is never called in the regular path of an unlink, neither is sleep_on. wait_on_buffer() is called a lot but the buffer is never locked in any of my cases. I still have some work to do to understand the L1 operation in this.
—M
- aug. 2022 kl. 19:06 skrev Gregory Haerr @.***>:
Hello @Mellvik https://github.com/Mellvik,
I have duplicated the QEMU-staged "mnb: still zero bit!8192" bug on my system. I'm not sure if this is a relic of just having run out of disk space, or whether this is related to a problem keeping up with the MINIX allocated-block bitmap. I have noted this, and will treat this bug separately from this issue. I plan on looking further into it shortly. This is very likely not related to this open issue at all, but I'll test a bit more and report back.
In the meantime, definitely lets plan on continued testing as you've described above, separate fro this new issue, thanks!
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1204238608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOHAUHJUMYASVHNAWE3VXKRH7ANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
With regards to the "new" bug, lets call it "out of disk space" bug:
Also I agree it's likely a separate issue, it would be interesting to see if it goes away if XMS buffers are compiled in. I haven't checked that.
I'm testing with XMS, and the bug is present. I know what it is: MINIX keeps an 8192-byte bitmap, which equals 8192*8 = 64K bits, where each bit is a 1K block in the max 64K minix V1 filesystem size. There are actually two bugs here: the first bug is that the system isn't checking when the bitmap is >= 8192 properly, and gives the incorrect error "mnb: bit still set!8192" in that case. I am working on fixing that now.
as the mount commands show, there is ample space on the fs.
That's the second bug, I have just realized. I am testing on a 32Mb filesystem, and it is running out of space and giving the first error message. I have not been able to (yet) duplicate the second bug, which is saying there's no more disk space when there should be.
I will fix the "first" portion of the out of disk space bug in a PR, then you can try to repeat the "second" out of disk space bug.
as far as I can tell from the tracings, schedule() is never called in the regular path of an unlink,
Please comment out schedule() in the truncate code, and let me know what happens.
neither is sleep_on
Please add a printk before the "sleep_on" in map_buffer to ensure the process is never put to sleep. This will let us know for sure whether you are correct in the process never sleeping (which means never running out of L1 buffers).
I still have some work to do to understand the L1 operation in this.
L1 operation is actually quite simple - map_buffer and unmap_buffer take the buffer, and copy its data contents to an L1 region in the kernel data segment. The bh_data is then directly addressable from the other kernel routines, without using a far pointer. If there are no L1 buffers left, the process is put to sleep. Its about that simple. We just need to prove that we're not ever sleeping, to eliminate a bunch of variables for your issue. (When the buffer is later unmapped, the buffer contents are copied back to the "real" L2 buffer).
If you comment out schedule and printk sleep_on, we'll know much more and be able to more rapidly squash this bug!
Finally some (real) progress on this issue:
- The problem is a ‘memory leak’. 4 bytes at a fixed address in the kernel text segment are being overwritten at some point during the boot & startup process. Thus the erratic symptoms. Changing one line of code somewhere (before that address) changes the behaviour completely. (BTW - memory error has been ruled out - again, via extensive testing).
- system-nm.map and the
hd
command were instrumental in finally locating the problem. - The location and content are always the same,
c7 ef 00 f0
@ 0x16238 in the kernel text segment. The following example shows uncorrupted (first) then corrupted data.
elks17# hd d0:16238#4 <— floppy booted
006f38: c2 b8 fe ff eb cc 56 57 55 89 e5 83 ec 06 8d 46 ......VWU......F
elks17# hd d0:16238#4. <—— HD booted
006f38: c7 ef 00 f0 eb cc 56 57 55 89 e5 83 ec 06 8d 46 ......VWU......F
- This happens only on the SX machine, never on Compaq (386dx) or qemu. The Compaq has a hw config that is very similar to the SX.
- Changing from EXT to XMS buffers only changes the layout of the code. The overwrite is at exactly the same place, now corrupting some code I haven’t triggered yet.
Now the good news, that may explain why the problem occurs only on this particular piece of hardware:
- When booting from floppy, the problem is gone. Mounting hd file systems thereafter is no problem. Booting from floppy with root on HD (via
bootopts
) is no problem either. - The size of the hd minix fs doesn’t matter (I’ve tried only 2)
So the problem appears ONLY when booting the SX from HD, which should limit the scope of the continued hunt. So the boot device matters, the root device does not. Even at this level, the difference between the Compaq and the SX is limited. Both use an IDE disk channel, both use the same IDEtoCF adapter and the same CF card (512MB). Possibly something DPT/DPB related or data returned from BIOS calls?
I’m looking at that right now. In any case - as it turns out, the name of this issue is wrong - this is not a buffer
problem, and most unlikely a fs
problem. We're getting there.
To be continued.
-M
Hello @Mellvik,
Wow, you really have made some progress on this!! I'm interested in how you were able to pinpoint the memory locations that are being overwritten?
I'd like to understand a bit more about exactly what this location is in the kernel, and whether it has any special meaning. In doing so, we need to "normalize" our segment:offset pairs for use with hd
, and in testing, I've already ran into a possible problem with hd
that I'm going to look into.
With regards to the normalizing, let's look at the special address "d0:16238". The offset value is 64k (=10000) greater than will fit in an 8086 offset, so this is the same as "d0:6238". What we want to do is look at the kernel startup banner and use the kernel code segment value from the following line:
Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000, top 9fc0:0, 490K free
So, using 2d0, we should be able to (re)write the d0:6238 value as 2d0:<something>. (Technically, to get the 20-bit memory address, the segment value is shifted left 4 bits and the offset added: d0:6238 = D00 + 6238 = 0:6F38). Thus, d0:6238 is the same as 2d0:4238.
If I run the following to get a sorted kernel map file:
cat system-nm.map | sed -e '/&/d; /!/d' | sort > file
Looking in the file for somewhere near '00014238' shows the affected location as in the middle of the sock_read
code:
0001418d t sock_write
000141eb t sock_read
00014249 t sock_release
0001427b t sock_close
Can you check with the normalized (likely 2D0) segment and see what the normalized offset is, and its position in the kernel map file?
If you find the actual memory leak is at 2D0:6238, which corresponds to the kernel map file address 00016238 (which may be hard to do given the possible hd
bug below), then, looking at the kernel map file, 00016238 is (guess what?) right in the middle of minix_unlink
, which fits with why the kernel became unstable after an unlink
syscall:
00015fba T minix_rmdir
0001613a T minix_unlink
0001629d T minix_symlink
On another note, when trying hd
, I try to rewrite the d0:6238 address with segment 2d0: I find that hd
seems to have a bug:
# hd d0:6238#4
006f38: 12 92 83 c4 08 eb de ba f7 ff eb d9 ba ea ff eb ................
# hd 2d0:6638#4
006f38: 04 ff 76 fc 8b 5e 08 ff 77 20 e8 a2 ed 83 c4 04 ..v..^..w ......
Note that both addresses passed to hd give the same 20-bit 006f38 memory address (which is incorrect), but it is showing the contents differently, as expected. I'll report on this bug shortly.
Thank you!
Hi @ghaerr, I just noticed the omission in my report, please ignore the '2d0' vs 'd0' segment difference.
I forgot to correct that - they’re from two different runs, different builds. Some times the kernel txt segment is 2d0, sometimes it’s d0.
So I did this the hard way. The address is in the middle of kernel code, more specifically (depends on the build) in V1_trunc_direct unless we build with XMS buffers, then it’s a bit further out if I remember correctly. The hard way is as follows: V1_trunc_direct() is acting up so get the address from system-nm.map (I’ve been studying disassembly listings of truncate.o for weeks, changing one statement at a time to try to understand what’s been going on, so I’m more familiar with the low level code than I ever wanted to. BTW - the complier does an incredible job optimizing, that’s different story) Dump the memory region using hd, compare with the obj dump output (mind you, on paper - yes, I know …) Find the same area in the kernel file to make sure we’re not looking at relocation and linking differences. Identify differences. After rigorous verification we have the verdict, which corresponds to the behavior. Including the weird bitmap errors we got the other day.
What I was on my way to look at when your mail arrived, was bioshd.c and how it probes drives, in order to see if the BIOS returns odd values for the DiskOnChip which the system supports. The DOC has not been installed during testing, but it may affect how the BIOS responds to probes.
—M
- aug. 2022 kl. 18:01 skrev Gregory Haerr @.***>:
Hello @Mellvik https://github.com/Mellvik,
Wow, you really have made some progress on this!! I'm interested in how you were able to pinpoint the memory locations that are being overwritten?
I'd like to understand a bit more about exactly what this location is in the kernel, and whether it has any special meaning. In doing so, we need to "normalize" our segment:offset pairs for use with hd, and in testing, I've already ran into a possible problem with hd that I'm going to look into.
With regards to the normalizing, let's look at the special address "d0:16238". The offset value is 64k (=10000) greater than will fit in an 8086 offset, so this is the same as "d0:6238". What we want to do is look at the kernel startup banner and use the kernel code segment value from the following line:
Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000, top 9fc0:0, 490K free So, using 2d0, we should be able to (re)write the d0:6238 value as 2d0:
. (Technically, to get the 20-bit memory address, the segment value is shifted left 4 bits and the offset added: d0:6238 = D00 + 6238 = 0:6F38). Thus, d0:6238 is the same as 2d0:4238. If I run the following to get a sorted kernel map file:
cat system-nm.map | sed -e '/&/d; /!/d' | sort > file Looking in the file for somewhere near '00014238' shows the affected location as in the middle of the sock_read code:
0001418d t sock_write 000141eb t sock_read 00014249 t sock_release 0001427b t sock_close Can you check with the normalized (likely 2D0) segment and see what the normalized offset is, and its position in the kernel map file?
If you find the actual memory leak is at 2D0:6238, which corresponds to the kernel map file address 00016238 (which may be hard to do given the possible hd bug below), then, looking at the kernel map file, 00016238 is (guess what?) right in the middle of minix_unlink, which fits with why the kernel became unstable after an unlink syscall:
00015fba T minix_rmdir 0001613a T minix_unlink 0001629d T minix_symlink On another note, when trying hd, I try to rewrite the d0:6238 address with segment 2d0: I find that hd seems to have a bug:
hd d0:6238#4
006f38: 12 92 83 c4 08 eb de ba f7 ff eb d9 ba ea ff eb ................
hd 2d0:6638#4
006f38: 04 ff 76 fc 8b 5e 08 ff 77 20 e8 a2 ed 83 c4 04 ..v..^..w ...... Note that both addresses passed to hd give the same 20-bit 006f38 memory address (which is incorrect), but it is showing the contents differently, as expected. I'll report on this bug shortly.
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1210913480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOFU626DNTW266NVKPTVYPG4RANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
With regards to the origin of the memory corruption:
elks17# hd d0:16238#4. <—— HD booted
006f38: c7 ef 00 f0 eb cc 56 57 55 89 e5 83 ec 06 8d 46 ......VWU......F
The four-byte value looks suspiciously like a possible BIOS ROM address (retrieved from memory would be): F0 00 EF C7. If its an address, this would be BIOS ROM at segment F000, address EFC7. I'm wondering whether it is a strange BIOS bug possibly to do with our bioshd driver setting a custom DDPT, or something like that.
I am just about to release my next-level debug tools, which feature a stack tracer and disassembler, both translating addresses to symbol names from the application's symbol file. I don't yet have it working inside the kernel. It currently allows for an application to disassemble portions of itself or perform a stack backtrace.
I think I'll add a seperate disasm
utility that works like hd
, where any segment:offset address can be given, and have it able to take a .map file, which could then allow for arbitrary disassembly of (kernel) memory, complete with kernel symbols. This should be able to help with this problem. I'll also fix the hd
bug above.
Back the the corruption problem: I'm wondering if perhaps the low-core DDPT vector isn't being set properly for HD, but is for FD, for instance. This is the 1Eh vector, so its low core address would be 1E<<2 = 0:0078. Perhaps you should check to see what might be happening at that address. Also, in bioshd.c, you might turn on DEBUG_BIOS in debug.h, which will turn on some printks for DDPT copying (see copy_ddpt). Looking at this code, however, shows ELKS only sets the 1E low core vector to point at a new DDPT.
was bioshd.c and how it probes drives, in order to see if the BIOS returns odd values for the DiskOnChip which the system supports.
You may be on to something here: lets take a closer look at the probing routines used by bioshd.c for the devices, in particular bioshd_gethdinfo (and possibly bioshd_getfdinfo). It's a little ugly in there, since we're temporarily sharing code for IBM and PC98, but there are notes like the following, which are a bit suspicious:
BD_AX = BIOSHD_DRIVE_PARMS;
BD_DX = 0x80; /* query hard drives only*/
BD_ES = BD_DI = BD_SI = 0; /* guard against BIOS bugs*/ // <---!!!
if (!call_bios(&bdt))
ndrives = BD_DX & 0xff;
else
debug_bios("bioshd: get_drive_parms fail on hd\n");
I'm thinking perhaps just commenting out the call_bios
entirely (possibly just fake filling in the results, but not making the BIOS call). It is definitely possible that the SX BIOS is trashing a location as a result of one of these calls.
Probably turn off CONFIG_IDE_PROBE as well.
Another possibility is to look very closely at the boot screen display differences for bioshd: output when booting from FD vs HD.
Thank you @ghaerr -
With regards to the origin of the memory corruption:
elks17# hd d0:16238#4. <—— HD booted 006f38: c7 ef 00 f0 eb cc 56 57 55 89 e5 83 ec 06 8d 46 ......VWU......F The four-byte value looks suspiciously like a possible BIOS ROM address (retrieved from memory would be): F0 00 EF C7. If its an address, this would be BIOS ROM at segment F000, address EFC7. I'm wondering whether it is a strange BIOS bug possibly to do with our bioshd driver setting a custom DDPT, or something like that.
I was thinking along those lines too, but I didn’t follow the discussions closely when this stuff was updated, so I’m not familiar with it. I believe it’s important to keep in mind that the corruption doesn’t happen if we boot from floppy, then use HD as root. So whatever happens, it’s very early.
I am just about to release my next-level debug tools, which feature a stack tracer and disassembler, both translating addresses to symbol names from the application's symbol file. I don't yet have it working inside the kernel. It currently allows for an application to disassemble portions of itself or perform a stack backtrace.
I think I'll add a seperate disasm utility that works like hd, where any segment:offset address can be given, and have it able to take a .map file, which could then allow for arbitrary disassembly of (kernel) memory, complete with kernel symbols. This should be able to help with this problem. I'll also fix the hd bug above.
Great idea - that would be very welcome. Hd-bug, thanks - it’s on my list too, I’m deleting it now that you’re on it.
Back the the corruption problem: I'm wondering if perhaps the low-core DDPT vector isn't being set properly for HD, but is for FD, for instance. This is the 1Eh vector, so its low core address would be 1E<<2 = 0:0078. Perhaps you should check to see what might be happening at that address.
I don’t know what to look for yet, but I’ll compare the values on floppy booted and HD booted systems. Here’s the HD booted: elks17# hd 2d0:16238#4 006f38: c7 ef 00 f0 fa 52 ff 74 20 e8 a9 ed 83 c4 04 eb .....R.t ....... elks17# elks17# hd 0:78#10 000078: b4 b9 9c 13 e0 5a 00 c0 a0 87 00 f0 a0 87 00 f0 .....Z.......... elks17#
Also, in bioshd.c, you might turn on DEBUG_BIOS in debug.h, which will turn on some printks for DDPT copying (see copy_ddpt). Looking at this code, however, shows ELKS only sets the 1E low core vector to point at a new DDPT.
OK, I’ll be back with more …
—M
Hello @ghaerr, hello @Mellvik,
You may be on to something here: lets take a closer look at the probing routines used by bioshd.c for the devices, in particular bioshd_gethdinfo (and possibly bioshd_getfdinfo). It's a little ugly in there, since we're temporarily sharing code for IBM and PC98, but there are notes like the following, which are a bit suspicious:
BD_AX = BIOSHD_DRIVE_PARMS; BD_DX = 0x80; /* query hard drives only*/ BD_ES = BD_DI = BD_SI = 0; /* guard against BIOS bugs*/ // <---!!! if (!call_bios(&bdt)) ndrives = BD_DX & 0xff; else debug_bios("bioshd: get_drive_parms fail on hd\n");
The "fix" follows a suggestion from Ralf Brown's Interrupt List:
--------B-1308-------------------------------
INT 13 - DISK - GET DRIVE PARAMETERS (PC,XT286,CONV,PS,ESDI,SCSI)
AH = 08h
DL = drive (bit 7 set for hard disk)
ES:DI = 0000h:0000h to guard against BIOS bugs
Return: CF set on error
AH = status (07h) (see #00234)
CF clear if successful
AH = 00h
AL = 00h on at least some BIOSes
BL = drive type (AT/PS2 floppies only) (see #00242)
CH = low eight bits of maximum cylinder number
CL = maximum sector number (bits 5-0)
high two bits of maximum cylinder number (bits 7-6)
DH = maximum head number
DL = number of drives
ES:DI -> drive parameter table (floppies only)
...
some Toshiba BIOSes (at least before 1995, maybe some laptops???
with 1.44 MB floppies) have a bug where they do not set the ES:DI
vector even for floppy drives. Hence these registers should be
preset with zero before the call and checked to be non-zero on
return before using them. Also it seems these BIOSes can return
wrong info in BL and CX, as S/DOS 1.0 can be configured to preset
these registers as for an 1.44 MB floppy.
Also, the DDPT — as returned via es:di
, and which should be pointed to by the int 0x1e
vector — is only relevant for floppy drives, not for hard drives. My understanding is that parameters that govern hard drive I/O are specified by a different set of vectors, namely int 0x41
, int 0x46
, etc.
Thank you!
Hello @Mellvik,
I don’t know what to look for yet, but I’ll compare the values on floppy booted and HD booted systems. Here’s the HD booted: elks17# hd 2d0:16238#4 006f38: c7 ef 00 f0 fa 52 ff 74 20 e8 a9 ed 83 c4 04 eb .....R.t ....... elks17# elks17# hd 0:78#10 000078: b4 b9 9c 13 e0 5a 00 c0 a0 87 00 f0 a0 87 00 f0 .....Z.......... elks17#
As a sanity check, perhaps try dumping the bytes that these pointers are pointing to? c7 ef 00 f0
would be 0xf000:0xefc7
(as @ghaerr pointed out). And b4 b9 9c 13
would be 0x139c:0xb9b4
. These should point to 11-byte (or 14-byte) DDPTs and should be mostly identical. So... maybe try
# hd f000:efc7#e
# hd 139c:b9b4#e # or whatever address you read off from 0:78
(And some other possible sanity checks:
- Is e.g.
0x139c:0xb9b4
the runtime address of theDDPT[]
variable in the ELKS kernel? - Does the
vec1E
variable always read78 00 00 00
?)
Thank you!
Hello @Mellvik, Hello @tkchia,
I agree with all of @tkchia's suggestions, they are very good ideas. I think we can get to the bottom of this, but I want to point out that with all the segment:offset addresses being presented, it is important that two things be done after each new kernel build, so that we can more easily understand the results of each test (and yes, I am working to automate this):
- Run
cat system-nm.map | sed -e '/&/d; /!/d' | sort > linux.map
to generate a readable kernel symbol table. - Record the text, fartext, and data segment values from the kernel boot line:
Kernel text at 2d0:0000, ftext 1210:0000, data 1512:0000
The above would show 2d0: mapping to symbols in the linux.map 0x0001XXXX section (.text) of the kernel.map file, 1210: to 0x0002XXXX section (.fartext), and 1512: mapping to 0x0003XXXX (.data/.bss) section.
Then, only use one of the three segment values in hd
for display, and rewrite any longword values into segment:offset. For instance, with @tkchia's suggestion above to look at 0x139c:0xb9b4
, this makes sense when the kernel .data segment is 0x139c, and would then allow to search the kernel map file for 0x0003b9b4. (In my mapfile, this is not the DDPT[] variable, but much higher, in the .bss area).
I plan on adding a couple more capabilities and files to the distribution: the ability to generate a human readable sorted .map file for the kernel or any application, as well as a .sym file (binary format) of the same, for use in the hd
and disasm
tool enhancements. The kernel files could be called linux.map and linux.sym and could be placed in the ELKS /lib directory for use by the tools or humans for real-time debugging. (There is still the problem of determining the runtime kernel .text, .fartext and .data segment values to map to the appropriate sections, any thoughts on how to do that?)
Thank you!
Hello @Mellvik,
Hd-bug, thanks - it’s on my list too, I’m deleting it now that you’re on it.
I've found it, but am in the middle of the disasm
enhancement, and I can't post it immediately. The 20-bit address to the left is sometimes displayed incorrectly, but the contents are correct. Apply this patch to fix it:
diff --git a/elkscmd/misc_utils/hd.c b/elkscmd/misc_utils/hd.c
index 53cfdb3d..f6653cc6 100644
--- a/elkscmd/misc_utils/hd.c
+++ b/elkscmd/misc_utils/hd.c
@@ -112,7 +112,7 @@ void do_mem(char *spec)
//printf("SEG %x OFF %x CNT %ld\n", seg, off, count);
addr = (unsigned char __far *)(((unsigned long)seg << 16) | off);
- offset = (((unsigned long)seg << 4) | off);
+ offset = (((unsigned long)seg << 4) + off);
for ( ; count > 0; count -= 16, offset += 16) {
int j, ch;
memset(buf, '\0', 16);
elks17# hd 2d0:16238#4
Note that "16238" is not a valid offset, as its > 64K. For now, hd
just wraps and ignores it, but a future version may not. You should just be using hd 2d0:6238#4
. I will probably fix hd
to display an error message for this case to avoid confusion.
Thank you!
Thank you @tkchia and @ghaerr - that's a lot of useful info and suggestions. I'll attack them one by one.
-
I did take out IDE_PROBE, but as expected it's called far to late in the boot process to affect the situation.
-
Slightly off topic but affecting my speed of progress: @ghaerr - I'm getting these (picture) every once in a while, and the only remedy is to clean out the file system (
mkfs
). Is there a better way? (the output after the second 'Linux Found' message may vary ...). -
I did include the BIOS debug in a build, and got the following (booting from HD, error verified to be present) (DiskOnChip now installed which doesn't change anything other than adding a HD and reducing available memory):
Direct console, scan kbd 80x25 emulating ANSI (3 virtual consoles)
ttyS0 at 0x3f8, irq 4 is a 16550A
ttyS1 at 0x2f8, irq 3 is a 16550A
ttyS2 at 0x3e8, irq 5 is a 16550A
ttyS3 at 0x2e8, irq 2 is a 16550A
64 ext buffers, 65536 ram
eth: ne2k at 0x380, irq 9 (8bit), MAC 00:1f:11:02:60:2d, flags 0x80
eth: wd80x3 at 0x240, irq 2, ram 0xce00 not found
eth: 3c509 at 0x330, irq 11 not found
bioshd: hda BIOS CHS 992,16,63
bioshd: hdb BIOS CHS 728,2,1
/dev/hda: 992 cylinders, 16 heads, 63 sectors = 488.3 Mb
/dev/hdb: 728 cylinders, 2 heads, 1 sectors = 728.0 kb
/dev/fd0: 80 cylinders, 2 heads, 18 sectors = 1440.0 kb
bioshd: DDPT vector 8000:0 SPT 216
Partitions: hda:(0,999936) bioshd: lba 0 is CHS 0/0/1 remaining sectors 63
bioshd(128): track read CHS 0/0/1 count 18
bioshd(128): track read lba 0 to 17 count 18
bioshd(128): cache hit lba 0
bioshd(128): cache hit lba 1
hda1:(63,696465) hda2:(696528,132048) hda3:(828576,132048) hda4:(960624,41328)
hdb:(0,1456) bioshd: lba 0 is CHS 0/0/1 remaining sectors 1
bioshd(129): track read CHS 0/0/1 count 1
bioshd(129): track read lba 0 to 0 count 1
bioshd(129): cache hit lba 0
bioshd: lba 1 is CHS 0/1/1 remaining sectors 1
bioshd(129): track read CHS 0/1/1 count 1
bioshd(129): track read lba 1 to 1 count 1
bioshd(129): cache hit lba 1
hdb1:(1,1455)
device_setup: BIOS drive 0x80, root device 0x304
PC/AT class machine, syscaps 0xff, 608K base ram.
ELKS kernel 0.6.0 (57216 text, 11536 ftext, 8384 data, 42400 bss, 14750 heap)
Kernel text at 2d0:0000, ftext 10c8:0000, data 1399:0000, top 9800:0, 465K free
bioshd: lba 960626 is CHS 953/0/3 remaining sectors 61
bioshd(128): track read CHS 953/0/3 count 18
bioshd(128): track read lba 960626 to 960643 count 18
...
I noticed the DDPT vector being 8000:0, which according to hd
looks like this:
080000: 8b 46 fc 83 e0 40 74 10 8b 5e 08 80 7f 01 40 75 .F...@t..^....@u
- The bad data, if it's a pointer, points as suggested into what looks like BIOS code
# hd f000:efc7#e
0fefc7: df 02 25 02 24 1b ff 54 f6 0f 08 e9 b4 08 33 db ..%.$..T......3.
0fefd7: b4 00 b9 b7 00 8a c1 e6 70 e3 00 e3 00 e4 71 e3 ........p.....q.
0fefe7: 00 03 d8 41 81 f9 be 00 75 02 41 41 2e 3b 0e ee ...A....u.AA.;..
.... while the DDPT pointer and content looks like this now:
# hd 0:78#4
000078: e4 ba 99 13 e0 5a 00 c0 a0 87 00 f0 a0 87 00 f0 .....Z..........
# hd 1399:bae4
01bbf4: b8 20 01 8e 3f 81 3e ce 05 55 aa 75 08 81 02 00 . ..?.>..U.u....
01bc04: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01bc14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
*
#
And DDPT in the symbol table is 003bae4
so this looks good.
-
vec1E
looks right:
# hd 1399:1fd0#4
013fd0: 78 00 00 00 80 81 82 83 00 01 02 03 28 00 09 00 x...........(...
- The boot screens are the same when booting from HD and FD.
I'm continuing with the bios calls in gethdinfo()
.
The BIOS for the DiskOnChip is at d000:0 (jumper selectable), the docs do not mention a way to disable the DOC altogether, which would have been advantageous in this process.
--M
Hello @Mellvik,
From the most recent tests, everything looks pretty normal. I'm thinking perhaps now that we know where the RAM corruption is, we might concentrate on exactly when it occurs. That is, seeing some of your boot error message, perhaps the corruption is occurring very early in the boot process, possibly just after the kernel is loaded into RAM and relocated.
We can call printk very early in the boot sequence now, perhaps displaying the contents of the corrupted RAM location at several places during startup, using something like the following would show corruption:
printk("RAM %lx\n", *(unsigned long __far *)0x00D06238L);
(Change the long constant to exactly the memory location error; in this case the long hi word is segment and lo word is offset).
This (or a procedure call wrapper around it) could then be placed in elks/init/main.c::kernel_init() in various places.
I'm getting these (picture) every once in a while, and the only remedy is to clean out the file system (mkfs). Is there a better way? (the output after the second 'Linux Found' message may vary ...).
I'm wondering whether these boot retry and multiple "Linux found" error messages are symptoms of the same problem (very early corruption) - there should never be multiple "Linux found" messages. This could indicate either a corrupted HD boot sector, something I have been considering to be the problem, and/or RAM corruption during the boot sector operation (it is loaded at 0:7C00, and then relocates itself several times, and the kernel could be getting corrupted before it even starts running).
IIRC, the boot sector also rewrites the DDPT vector. I will look further into that, although I'm not sure we have much evidence that DDPT is the problem.
Thank you!
Hello @Mellvik, Hello @tkchia,
I just noticed the old DDPT vector is at 0x8000:0. That's right in the middle of memory, possibly where the boot sector code is operating and relocating the kernel.
I seem to remember that the boot code sets the DDPT vector while reading the additional boot sectors, then resets it just before exit. Should this reset to 0x8000:0, that might be in the middle of the kernel in RAM (depending on how much RAM is on this system), during the first kernel relocation, or afterwards, before it is jumped to. I'm wondering whether the boot code needs to reset the DDPT, and/or whether this DDPT location may be part of the problem seen in both the multiple "Linux Found" error messages, as well as the RAM corruption.
What I'm thinking is perhaps the BIOS is also trying to "reset" the DDPT vector somehow by writing its DDPT location into the low-core INT 1Eh vector ELKS is/was using, which still points into the RAM area ELKS is using, corrupting it. It is possible this DDPT "reset" rewrite occurs during the bioshd.c BIOSHD_RESET call, or much earlier in the boot process.
Thank you!
@Mellvik,
To build a boot sector that does not reset the DDPT, the following line should be commented out of bootblocks/boot_sect.S:
boot_it:
// Signify that /linux was loaded as 1 blob
orb $(EF_AS_BLOB|EF_BIOS_DEV_NUM),elks_flags
mov %dx,root_dev
mov %cx,part_offset // save sector offset of booted partition
mov %bx,part_offset+2
// RESTORE_DDPT // <-- comment this line out
mov $LOADSEG,%ax
mov %ax,%ds
mov %ax,%es
ljmp $LOADSEG+0x20,$0
However, if this is found to be the issue, NOT restoring the DDPT will still likely cause the BIOS to "reset" the DDPT using the boot sector's INT 1Eh address, which is still pointing into ELKS-used RAM, and just cause corruption at a different address. Thus, the RESTORE_DDPT perhaps needs to write an INT 1eh vector pointing to some unused RAM area for this BIOS?
We're all still guessing here - the BIOS should be "resetting" the DDPT by actually rewriting the 1Eh vector. I am only guessing that what is happening is that it is using the existing DDPT vector, but adding some bytes (which look like an address?) at the end of it? This may be a case of buggy BIOS, and we're going to have to work around it, since this only happens on one system.
@Mellvik: I'm still trying to figure out why the INT 1Eh vector is supposedly set by default to 8000:0. (=512k). How much memory is on the SX? Also, it would be useful to display the low core area using hd 0:0#512
.
Hello @ghaerr,
Hmm... is the bug what I suspect it is? If my surmise is correct:
- The BIOS did set
int 0x1e
to point to the "standard", IBM-compatible DDPT address, which is0xf000:0xefc7
. - ELKS's bootloader (
elks/bootblocks/boot_sect.S
) temporarily moved the DDPT to0x8000:0
. - After loading the kernel and
setup.S
code, the bootloader later tried to restore the0xf000:0xefc7
vector by writing to0:0x78
, but failed. Instead0xf000:0xefc7
was written to some other address, specifically0x7c0:0x78
.
More precisely: currently the elks/bootblocks/boot_sect.S
code assumes it was loaded at 0:0x7c00
and the BIOS or previous bootloader transferred control to it at 0:0x7c00
. It is possible that @Mellvik's hard disk MBR decided to be a bit ornery and transferred control to 0x7c0:0
, rather than 0:0x7c00
. If so, this would probably be quite a rare instance that we encounter such an intermediate bootloader.
@Mellvik, perhaps you can try this patch and see if it improves things?
diff --git a/bootblocks/boot_sect.S b/bootblocks/boot_sect.S
index 19829644..a9914fe7 100644
--- a/bootblocks/boot_sect.S
+++ b/bootblocks/boot_sect.S
@@ -65,7 +65,8 @@
.text
-// Loaded by the BIOS at 0:7C00h (32K - 1K) with DL = boot drive number
+// Loaded by the BIOS at 0:7C00h (32K - 1K) or 7C0h:0
+// with DL = boot drive number
entry:
@@ -129,7 +130,7 @@ entry1:
lds (%bx),%si // ds:si = BIOS original table
push %ds
push %si
- push %cs
+ push %cx // push address 0:78h (cx = 0)
push %bx
.if floppy_table == entry
xor %di,%di
Thank you!
Thank you @tkchia,
I did rush to get this one teated, AND IT SEEMS TO WORK. Admittedly I need to sit down tomorrow and understand exactly ’the magic’, but it looks very promising.
I ran into a couple of snags - unrelated to the memory overwrite problem. First, my modified version of makeboot (to just copy the bootblack) had a couple of issues. Then it turns out that the makefile in bootblocks doesn’t pad minix.bin to 1024, which makeboot demands, so some bandaid was required.
Anyway - thanks @tkchia and @ghaerr, we may actually have nailed it - 30 days in.
—M
- aug. 2022 kl. 17:33 skrev tkchia @.***>:
Hello @ghaerr https://github.com/ghaerr,
Hmm... is the bug what I suspect it is? If my surmise is correct:
The BIOS did set int 0x1e to point to the "standard", IBM-compatible DDPT address, which is 0xf000:0xefc7. ELKS's bootloader (elks/bootblocks/boot_sect.S) temporarily moved the DDPT to 0x8000:0. After loading the kernel and setup.S code, the bootloader later tried to restore the 0xf000:0xefc7 vector at 0:0x78, but failed, and instead 0xf000:0xefc7 was written to some other address, specifically 0x7c0:0x78. More precisely: currently the elks/bootblocks/boot_sect.S code assumes it was loaded at 0:0x7c00 and the BIOS or previous bootloader transferred control to it at 0:0x7c00. It is possible that @Mellvik https://github.com/Mellvik's hard disk MBR decided to be a bit ornery and transferred control to 0x7c0:0, rather than 0:0x7c00. If so, this would probably be quite a rare instance that we encounter such a bootloader.
@Mellvik https://github.com/Mellvik, perhaps you can try this patch and see if it improves things?
diff --git a/bootblocks/boot_sect.S b/bootblocks/boot_sect.S index 19829644..94291c38 100644 --- a/bootblocks/boot_sect.S +++ b/bootblocks/boot_sect.S @@ -129,7 +129,7 @@ entry1: lds (%bx),%si // ds:si = BIOS original table push %ds push %si
push %cs
push %cx // push address 0:78h (cx = 0) push %bx
.if floppy_table == entry xor %di,%di Thank you!
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1212152222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOGHEAURCSIX6FWMZJLVYUMMNANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
Hello @tkchia,
Hmm... is the bug what I suspect it is? If my surmise is correct
Thank you for your help and fix. I was beginning to think it might be our own code in RESTORE_DDPT writing the longword after all, and you've nailed it. Nice catch :)
If so, this would probably be quite a rare instance that we encounter such an intermediate bootloader.
I'm almost certain this is the culprit, but we'll wait until @Mellvik tests a bit more before celebrating. What I don't yet understand is how this only fails on HD boots and never FD, but I suppose that all depends on exactly what code got corrupted.
Thank you!
Hi @tkchia and @ghaerr -
while there is still some tests I'd like to do - time permitting, I'm quite confident that the bug has been found and fixed. I've run some old kernels (marked 'bad') and they work (rm works) fine and the memory location has no traces of the previous C7 sequence. I also installed the modified bootblock on the compaq HD and it boots fine.
It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.
Happy to be out of the rabbit hole (again), I have some 'house cleaning' to do before submitting a PR with some updates, but this seems like a good time to close this issue.
Thanks again!
-M
Hello @ghaerr, hello @Mellvik,
Thanks for the heads-up!
What I don't yet understand is how this only fails on HD boots and never FD, but I suppose that all depends on exactly what code got corrupted.
My current wild guess, is that the ROM BIOS behaves like a bog-standard IBM-compatible BIOS, and transfers control to the bootloader through 0:0x7c00
, but the hard disk's master boot record has other ideas. Like so:
ROM BIOS
0:0x7c00 │ │ 0:0x7c00
▾ ▾
ELKS HD MBR
bootloader │ 0x7c0:0 (!?)
on FD ▾
ELKS
(FD boot) bootloader
on HD
(hd boot)
Such a situation will also explain why boots from the floppy drive are not affected by the glitch.
If this DDPT weirdness happened on my machine, I would be quite tempted to disassemble the MBR code, to confirm if it is indeed doing what I think it is doing. :slightly_smiling_face:
Thank you!
Hello @Mellvik, Hello @tkchia,
I would be quite tempted to disassemble the MBR code, to confirm if it is indeed doing what I think it is doing.
Great idea @tkchia. @Mellvik, if you could dd
the MBR off the SX HD and post it, I will add an ability for our new disasm
to take a file argument for disassembly, test that it displays properly, and we can then take a look at the MBR jump code.
Thank you!
It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.
@Mellvik, that was some fantastic debugging work, coming up with the exact location of the data corruption. And during the bug hunt, you managed to discover what turned out to be a major MINIX filesystem problem, as well. Nice work!!
It's indeed fascinating that a 4 week hunt for what initially seemed like something else, suddenly was fixed by changing one single letter in the bootblock code.
@Mellvik, that was some fantastic debugging work, coming up with the exact location of the data corruption. And during the bug hunt, you managed to discover what turned out to be a major MINIX filesystem problem, as well. Nice work!!
Thank you, @ghaerr. it ended up being a 'community effort' - and an educating experience. It never occurred to me that the MBR may have been part of it. Not the problem but the path to the problem. If I'd MBRd the drive we'd lost the chance to notice, find and fix the problem, :-)
If this DDPT weirdness happened on my machine, I would be quite tempted to disassemble the > MBR code, to confirm if it is indeed doing what I think it is doing. 🙂
Thanks @tkchia, I'll do just that and let you know (using the disasm utility). Turns out the MBR is the one originally delivered on the FAT formatted CF card (512mb).
-M
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
OK, here's the file - and a partial disasm skipping the strings and the partition table. SX-MBR.bin.gz
Looking at this it's easy to want more, such as the ability to force the disassembler to 'sync' at jump/call destinations or specific addresses. Fortunately in this case it seems to sync at 0x1f anyway, after the cli
byte.
# ./disasm SX-MBR.bin
CS = 530f
0000 eb 1c jmp .+30 // 001e
0002 60 ???
0003 00 8e c0 8e add %cl,0x8ec0(%bp)
0007 d0 31 SHL $1,(%bx,%di)
0009 e4 31 in $0x31,%al
000b 80 57 8e df adcb $0xdf,-0x72(%bx)
000f be 00 7c mov $0x7c00,%si
0012 b9 00 01 mov $0x100,%cx
0015 fc cld
0016 f3 repz
0017 a5 movsw
0018 06 push %es
0019 1f pop %ds
001a 07 pop %es
001b 16 push %ss
001c b9 21 fa mov $0xfa21,%cx
001f 33 c0 xor %ax,%ax
0021 8e d8 mov %ax,%ds
0023 8e d0 mov %ax,%ss
0025 bc 00 7c mov $0x7c00,%sp
0028 8b f4 mov %sp,%si
002a fb sti
002b c4 06 4c 00 les (0x004c),%ax
002f a3 07 7c movw %ax,(0x7c07)
0032 8c 06 09 7c mov %al,(0x7c09)
0036 a1 13 04 movw (0x0413),%ax
0039 48 dec %ax
003a 48 dec %ax
003b b1 06 mov $0x6,%cl
003d a3 13 04 movw %ax,(0x0413)
0040 d3 e0 shl %cl,%ax
0042 8e c0 mov %ax,%es
0044 b9 00 02 mov $0x200,%cx
0047 33 ff xor %di,%di
0049 fc cld
004a f3 repz
004b a4 movsb
004c b8 88 00 mov $0x88,%ax
004f 06 push %es
0050 50 push %ax
0051 cb retf
0052 1e push %ds
0053 50 push %ax
0054 f6 c2 f0 test $0xf0,%dl
0057 75 28 jnz .+42 // 0081
0059 d0 ec shr $1,%ah
005b fe cc decb %ah
005d 75 22 jnz .+36 // 0081
005f 33 c0 xor %ax,%ax
0061 8e d8 mov %ax,%ds
0063 a1 6c 04 movw (0x046c),%ax
0066 8a c4 mov %ah,%al
0068 2e 2a 06 03 00 subb %cs:(0x0003),%al
006d 3c 02 cmpb $0x2,%al
006f 72 10 jb .+18 // 0081
0071 2e 88 26 03 00 movb %cs:%ah,(0x0003)
0076 3d 02 00 cmpw $0x2,%ax
0079 73 03 jnb .+5 // 007e
007b e8 cc 00 call 014a (014a)
007e e8 e8 00 call 0169 (0169)
0081 58 pop %ax
0082 1f pop %ds
0083 2e ff 2e 07 00 33 c0 jmp %cs:c033:0007
008a 8e c0 mov %ax,%es
008c cd 13 int $0x13
008e 0e push %cs
008f 1f pop %ds
0090 80 3e 0b 00 00 cmpb $0x0,(0x000b)
0095 74 24 jz .+38 // 00bb
0097 be ae 01 mov $0x1ae,%si
009a 83 c6 10 add $0x10,%si
009d 80 3c 80 cmpb $0x80,(%si)
00a0 75 f8 jnz .-6 // 009a
00a2 8b 14 mov (%si),%dx
00a4 8b 4c 02 mov 0x02(%si),%cx
00a7 bb 00 7c mov $0x7c00,%bx
00aa b8 01 02 mov $0x201,%ax
00ad cd 13 int $0x13
00af 72 d7 jb .-39 // 0088
00b1 2e c7 06 48 01 c0 07 movw %cs:$0x7c0,(0x0148)
00b8 eb 7b jmp .+125 // 0135
00ba 90 xchg %ax,%ax
00bb b2 80 mov $0x80,%dl
00bd e8 a9 00 call 0169 (0169)
00c0 bf 18 00 mov $0x18,%di
00c3 8b 45 f9 mov -0x07(%di),%ax
00c6 b9 04 00 mov $0x4,%cx
00c9 d3 e8 shr %cl,%ax
00cb 8b e8 mov %ax,%bp
00cd 8b 45 fe mov -0x02(%di),%ax
00d0 d1 e0 shl $1,%ax
00d2 40 inc %ax
00d3 03 e8 add %ax,%bp
00d5 f6 35 idiv (%di)
00d7 8a cc mov %ah,%cl
00d9 fe c1 incb %cl
00db 33 d2 xor %dx,%dx
00dd 8a f0 mov %al,%dh
00df bb 00 05 mov $0x500,%bx
00e2 b8 01 02 mov $0x201,%ax
00e5 cd 13 int $0x13
00e7 72 f9 jb .-5 // 00e2
00e9 8b c5 mov %bp,%ax
00eb be 34 00 mov $0x34,%si
00ee b7 07 mov $0x7,%bh
00f0 f6 35 idiv (%di)
00f2 33 c9 xor %cx,%cx
00f4 86 e1 xchg %ah,%cl
00f6 2b f1 sub %cx,%si
00f8 f6 75 02 idiv 0x02(%di)
00fb 8b d0 mov %ax,%dx
00fd 86 d5 xchg %dl,%ch
00ff 8a 05 mov (%di),%al
0101 2a c1 sub %cl,%al
0103 fe c1 incb %cl
0105 b4 02 mov $0x2,%ah
0107 50 push %ax
0108 cd 13 int $0x13
010a 58 pop %ax
010b 72 f8 jb .-6 // 0105
010d b1 01 mov $0x1,%cl
010f 02 f8 add %al,%bh
0111 02 f8 add %al,%bh
0113 8b 05 mov (%di),%ax
0115 fe c6 incb %dh
0117 3a 75 02 cmp 0x02(%di),%dh
011a 72 04 jb .+6 // 0120
011c b6 00 mov $0x0,%dh
011e fe c5 incb %ch
0120 2b f0 sub %ax,%si
0122 73 e1 jnb .-29 // 0105
0124 03 c6 add %si,%ax
0126 b4 02 mov $0x2,%ah
0128 cd 13 int $0x13
012a 8a 6d fd mov -0x03(%di),%ch
012d 8b dd mov %bp,%bx
012f c7 06 48 01 70 00 movw $0x70,(0x0148)
0135 33 c0 xor %ax,%ax
0137 a2 0b 00 movb %al,(0x000b)
013a 8e d8 mov %ax,%ds
013c b0 52 mov $0x52,%al
013e a3 4c 00 movw %ax,(0x004c)
0141 8c 0e 4e 00 mov %cl,(0x004e)
0145 ea 00 00 70 00 jmp 0070:0000
014a bb ff ff mov $0xffff,%bx
014d b0 b6 mov $0xb6,%al
014f e6 43 out %al,$0x43
0151 89 d8 mov %bx,%ax
0153 e6 42 out %al,$0x42
0155 88 e0 mov %ah,%al
0157 e6 42 out %al,$0x42
0159 e4 61 in $0x61,%al
015b 0c 03 orb $0x3,%al
015d e6 61 out %al,$0x61
015f 4b dec %bx
0160 b9 64 00 mov $0x64,%cx
0163 e2 fe loopz .0 // 0163
0165 eb ea jmp .-20 // 0151
0167 5b pop %bx
0168 c3 ret
0169 53 push %bx
016a 51 push %cx
016b 52 push %dx
016c 06 push %es
016d 56 push %si
016e 57 push %di
016f 0e push %cs
0170 1f pop %ds
0171 0e push %cs
0172 07 pop %es
0173 bb 00 02 mov $0x200,%bx
0176 b9 01 00 mov $0x1,%cx
0179 32 f6 xor %dh,%dh
017b b8 01 02 mov $0x201,%ax
017e 9c pushf
017f ff 1e 07 00 72 22 call 2272:0007
0185 c7 07 eb 1c movw $0x1ceb,(%bx)
0189 80 fa 80 cmp $0x80,%dl
018c 75 04 jnz .+6 // 0192
018e 88 16 0b 02 movb %dl,(0x020b)
0192 fc cld
0193 b9 a0 01 mov $0x1a0,%cx
0196 be 1e 00 mov $0x1e,%si
0199 bf 1e 02 mov $0x21e,%di
019c f3 repz
019d a4 movsb
019e b8 01 03 mov $0x301,%ax
01a1 41 inc %cx
01a2 9c pushf
01a3 ff 1e 07 00 5f 5e call 5e5f:0007
01a9 07 pop %es
01aa 5a pop %dx
01ab 59 pop %cx
01ac 5b pop %bx
01ad c3 ret
Hello @Mellvik,
Thanks! This MBR code is highly interesting, but also a bit worrying. I hope this particular MBR is not malware (and I am not kidding). More on that later.
Thank you!
Thank you @tkchia - actually I had the same thought when a saw the string towards the end.
Looking forward to your verdict :-)
-M
BTW, continuing on the same tangent, I pulled the MBR off of two other vintage CF cards. Interestingly they are all different. Then again, old stuff like this via eBay will in many cases have been used for something or other before.
- aug. 2022 kl. 18:33 skrev tkchia @.***>:
Hello @Mellvik https://github.com/Mellvik,
Thanks! This MBR code is highly interesting, but also a bit worrying. I hope this particular MBR is not malware (and I am not kidding). More on that later.
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1367#issuecomment-1214186167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGODW7DHP64FSOLZ4U5DVY7E6ZANCNFSM53KDGGZQ. You are receiving this because you were mentioned.
Hello @Mellvik, hello @ghaerr,
Some quick observations:
- Yes, it turns out that the MBR does transfer control to the boot partition's bootloader through
0x7c0:0
. For some reason, the MBR uses self-modifying code to do that:
The instruction at00a7 bb 00 7c mov $0x7c00,%bx 00aa b8 01 02 mov $0x201,%ax 00ad cd 13 int $0x13 00af 72 d7 jb .-39 // 0088 00b1 2e c7 06 48 01 c0 07 movw $0x7c0,%cs:(0x0148) // [*] %cs: override applies to dest. operand 00b8 eb 7b jmp .+125 // 0135 ... 0135 33 c0 xor %ax,%ax 0137 a2 0b 00 movb %al,(0x000b) 013a 8e d8 mov %ax,%ds 013c b0 52 mov $0x52,%al 013e a3 4c 00 movw %ax,(0x004c) 0141 8c 0e 4e 00 movw %cs,(0x004e) // [*] %cl should be %cs 0145 ea 00 00 70 00 ljmpw $0x0070,$0x0000 // [*] this is the correct GNU as syntax
%cs:0x00b1
alters the instruction%cs:0x0145
so that, by the time we reach it, it effectively becomes, well,0145 ea 00 00 c0 07 ljmpw $0x07c0,$0x0000
- Also, you may notice that the instructions at
%cs:0x13e
—%cs:0x0144
redirect theint 0x13
vector (at0:0x4c
) to point at the MBR's own code (!). This is after the MBR has copied itself to upper conventional memory and then reserved the memory for itself (at%cs:0x3d
). - @ghaerr: there are some glitches in the disassembly. I have indicated some of these above (
[*]
). Also, the indirect far jump and call opcodes (0xff 0x2e
... /0xff 0x1e
...) are misinterpreted, e.g.
should be0083 2e ff 2e 07 00 33 c0 jmp %cs:c033:0007
and it really means to transfer control to the segment:offset value stored at0083 2e ff 2e 07 00 ljmpw *%cs:0x0007
%cs:0x0007
. Likewise for%cs:0x17f
and%cs:0x1a3
. - And yep, this MBR is almost certainly malware. It is rather interesting piece of malware too (in my opinion). But perhaps there is some classical anti-virus program that actually knows about it. (Edit: this might be the same as the "AntiCMOS.B" virus that F-Secure is referring to: http://www.f-secure.com/v-descs/anticmos.shtml .)
Thank you!
actually I had the same thought when a saw the string towards the end.
@Mellvik, can you post the binary boot block? I would like to see the string towards the end you're talking about.
there are some glitches in the disassembly.
@tkchia: thank you, I was going to warn you about the fact that this the disassembler was written from scratch and has had very little outside testing. I will study your glitches and correct. Still not bad for 4k bytes! :)
Hello @ghaerr,
can you post the binary boot block?
@Mellvik had provided a link: https://github.com/jbruchon/elks/files/9331781/SX-MBR.bin.gz . It goes without saying, but anyway: treat with extreme care. :astonished:
Still not bad for 4k bytes! :)
:slightly_smiling_face:
Thank you!
Hello @tkchia,
Wow, super job analyzing the SX malware boot sector! And thank you for the bug reports on disasm
.
The instruction at %cs:0x00b1 alters the instruction %cs:0x0145 so that, by the time we reach it, it effectively becomes, well, 0145 ea 00 00 c0 07 ljmpw $0x07c0,$0x0000
This is most likely to stop virus protectors from directly recognizing an ljmpw with an 0x07c0 operand, I would assume.
This is after the MBR has copied itself to upper conventional memory and then reserved the memory for itself (at %cs:0x3d).
What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?
you may notice that the instructions at %cs:0x13e—%cs:0x0144 redirect the int 0x13 vector (at 0:0x4c) to point at the MBR's own code (!).
So effectively, this boot block allows a later program to execute code in a reserved block of memory, accessed via INT 13h? What does that code do, print "I am Li Xi"!?
Thank you!
Hello @ghaerr,
What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?
0:0x0413
= 0x40:0x13
is the IBM BIOS variable that records the amount of base (conventional) memory available.
What does that code do, print "I am Li Xi"!?
It mainly intercepts BIOS disk I/O requests, and adds a few I/O requests of its own to write (and spread) itself onto disks.
Thank you!
Well, isn't that a catch! That's a great piece of work, @tkchia - much appreciated!
And yes, let's make sure we contain this one. Actually there is little danger of spreading, even between my systems, and the observation I had yesterday comes back: I might easily have ran fdisk /mbr on the drive and we'd missed the party. By now (as opposed to yesterday) I figure that might have been advantageous.
Anyway, now we know (with some probability) the reason for the boot screwup I occasionally get, reported yesterday. Virus on Elks. That's a new one indeed.
I'll certainly make a habit out of mbr'ing new (and old) cards from now on.
Finally, It seems to me that beyond wreaking havoc (random system problems) and attempting to spread itself, this virus doesn't do anyhing. So it ends up in the same class as the parity boot virus that I've seen a lot of these past few years - on old clunkers. @tkchia, do you agree.?
-M
- aug. 2022 kl. 21:16 skrev tkchia @.***>:
Hello @ghaerr,
What is the mechanism used to reserve memory for itself? I am quickly looking but don't see how?
0:0x0413 = 0x40:0x13 is the IBM BIOS variable that records the amount of base (conventional) memory available.
What does that code do, print "I am Li Xi"!?
It mainly intercepts BIOS disk I/O requests, and adds a few I/O requests of its own to write (and spread) itself onto disks.
Thank you!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
And yep, this MBR is almost certainly malware. It is rather interesting piece of malware too (in my opinion). But perhaps there is some classical anti-virus program that actually knows about it. (Edit: this might be the same as the "AntiCMOS.B" virus that F-Secure is referring to: http://www.f-secure.com/v-descs/anticmos.shtml .)
Thanks again, @tkchia - that pretty much nails it. No question - this is it. AntiCMOS.B
And it may - in addition to explaining the apparent file system corruption we talked about, also explain some weird behaviour when configuring some interrupts on the machine (CMOS).
Booting DOS and doing a fdisk /MBR
did the trick, and suddenly ELKS reports 2kb more available memory. Still not 640k, but close - 639k, the last K probably some DiskOnChip initialization code/data (if the DOC chip is installed, memtop
drops by 29k).
In order to track down the source of the virus I installed the DOC that came with the system, which becomes the boot drive if the system is configured that way (and no device with higher priority is found). And sure enough, here's the virus, on the DOC boot sector. So this is not something that came with a CF card, but with the system!
--M
Hello @Mellvik,
And sure enough, here's the virus, on the DOC boot sector. So this is not something that came with a CF card, but with the system!
Wow. Holy cow.
So it ends up in the same class as the parity boot virus that I've seen a lot of these past few years - on old clunkers. @tkchia, do you agree.?
Well, this AntiCMOS.B virus does seem to have one interesting property: instead of stashing away the original good MBR or floppy bootloader somewhere else on the boot medium, it actually tries to make itself be the bootloader, and take over the original bootloader's functions. (With less error checking, though.)
I have not completely analyzed the virus's floppy handling code, but I believe that it actually tries to parse the MS-DOS filesystem and load the first file in the root directory, and treat it as MS-DOS's io.sys
.
Thank you!
Hello @Mellvik,
Booting DOS and doing a fdisk /MBR did the trick, and suddenly ELKS reports 2kb more available memory.
That is a bit strange, since ELKS doesn't use the BIOS Data Area (BDA) at 0x40:0013 to learn the amount of memory available. (It does use INT 12h, which the virus MBR could have attached to - I haven't yet had time to fully study its code).
Nonetheless, it is interesting to know that ELKS is susceptible to what is essentially a DOS virus - and that if the virus hooks INT 13h in order to write itself as another MBR, ELKS may also be able to host the virus' replication, since all ELKS disk I/O also uses INT 13h!
It's a case study of how ancient structures can revive ancient viruses - with ELKS being designed around MBR compatibility we leave ourselves open to some of the earliest viruses.
Thank you!
Hi @Ghaerr,
Booting DOS and doing a fdisk /MBR did the trick, and suddenly ELKS reports 2kb more available memory.
That is a bit strange, since ELKS doesn't use the BIOS Data Area (BDA) at 0x40:0013 to learn the amount of memory available. (It does use INT 12h, which the virus MBR could have attached to - I haven't yet had time to fully study its code).
As it turns out these two are the same thing: INT 12 - BIOS - GET MEMORY SIZE Return: AX = kilobytes of contiguous memory starting at absolute address 00000h Note: this call returns the contents of the word at 0040h:0013h; in PC and XT, this value is set from the switches on the motherboard SeeAlso: INT 11"BIOS" http://fd.lod.bz/rbil/interrup/bios/11.html#sect-610,INT 2F/AX=4A06h http://fd.lod.bz/rbil/interrup/dos_kernel/2f4a06.html#sect-4865,INT 4C"Tandy 2000" http://fd.lod.bz/rbil/interrup/bios_vendor/4c.html#sect-6323,MEM 0040h:0013h http://fd.lod.bz/rbil/memory/bios/m00400013.html#sect-9584 [from http://fd.lod.bz/rbil/interrup/bios/12.html http://fd.lod.bz/rbil/interrup/bios/12.html ]
Nonetheless, it is interesting to know that ELKS is susceptible to what is essentially a DOS virus - and that if the virus hooks INT 13h in order to write itself as another MBR, ELKS may also be able to host the virus' replication, since all ELKS disk I/O also uses INT 13h!
It's a case study of how ancient structures can revive ancient viruses - with ELKS being designed around MBR compatibility we leave ourselves open to some of the earliest viruses.
Indeed. As long as we're using BIOS calls for I/O I guess we're susceptible. OTOH the fact that we're using floppy images a lot and have our own MBR and bootblocks provides a good immune system for ELKS. BTW - I was curious as to whether this one was able to prevent its own destruction (like writing a new MBR on an infected system): Booted the DOS partition and ran fdisk /mbr. It was not.
Now that we have this experience it's easier to think along those lines the next time something really weird is happening.
-M