v86 icon indicating copy to clipboard operation
v86 copied to clipboard

Fix booting from cd while hda present, booting from hda while cd present

Open JoeOsborn opened this issue 2 years ago • 22 comments

This is towards #900: We can now boot either from HDA or CDROM, with the other present, and it boots successfully. For testing I'm booting into Windows 3.0 (from CD) with Windows 3.1 on hda or from Windows 3.1 (on hda) with Windows 3.0 in the cdrom.

One big change here: The cd-rom device is now always present, either at ide0 or ide1 depending on whether hda/hdb are present. Of course we can still boot (at least into Windows) if you only have hda and an empty cdrom, or don't have hda and just have a cdrom.

JoeOsborn avatar Aug 18 '23 19:08 JoeOsborn

The Bochs BIOS does not like the empty cd-rom drive. I'll investigate how it should properly report being empty. If I keep the "empty drive" status register readout to 0, seaBIOS ignores the drive but bochs BIOS hangs.

JoeOsborn avatar Aug 18 '23 21:08 JoeOsborn

It looks like the ata command sending code in bochs doesn’t like how our buffer-less cdrom drive is reporting its state register, so it waits forever before sending the IDENTIFY PACKET DEVICE ATA command. I’ll investigate what’s going on Monday or Tuesday but my suspicion is that the current approach of hard coding one value to send for empty drives is not going work.

Rough summary: bochs checks for the status to be not-busy, then sends a command and waits for the device to be not-busy and yes-drq.

JoeOsborn avatar Aug 18 '23 22:08 JoeOsborn

Sorry for the spam. I think the path of least resistance here is to add a setting for “wants_cdrom” which defaults to true if cdrom is non-null, and a cdrom device should only be created if cdrom is defined or wants_cdrom is true. The bochs empty cdrom boot behavior really should be a separate PR aimed at #900 .

JoeOsborn avatar Aug 19 '23 00:08 JoeOsborn

Whoops, and I’ll revert my changes to the html files and main.js.

JoeOsborn avatar Aug 19 '23 00:08 JoeOsborn

I’ve trimmed down the changes made by this commit (I left the changes to logging because they’re actually pretty useful) to just changing the BAR4 location for each IDE device and not asserting on a spurious dma control write that occurs before a dma start command arrives (this is something some OSes or maybe seabios seem to do but it doesn’t seem to cause any harm to allow it, I guess they’ll try to trigger it again after it doesn’t do anything the first time—I wonder if a better fix would be for the start dma commands to start the dma transfer immediately if the control start bit was set already). I’ll do some more testing locally (I figured out how to run the integration tests locally and added test machines with both hda and cdrom and booting from one or the other) and I’ll push a commit over the next day or two.

JoeOsborn avatar Aug 19 '23 22:08 JoeOsborn

The main changes necessary to make this work were, in the end:

  1. Change the debug assert to an error condition in the "spurious DMA write" case
  2. Change the BAR4 address for each IDE controller.

The wants_cdrom change is not strictly necessary here but will be important for my next PR which will support empty CD-ROMs, so please consider rolling it in to this.

JoeOsborn avatar Aug 20 '23 14:08 JoeOsborn

I've gone ahead and added tests for booting with wants_cd true when Bochs bios is active (and inactive) and with varying permutations of hda, fda, and boot order in FreeDOS. It now boots successfully with an empty CD-ROM in both BIOSes.

The problem is that Bochs was enumerating the slave device of every IDE device, but the slave device was seemingly never fully implemented (?)---so instead I have dummied out writes to slave devices when the slave buffer is null.

JoeOsborn avatar Aug 21 '23 20:08 JoeOsborn

With these changes, CDs are also readable by at least FreeDOS 13 with the UDVD2 driver. They don’t seem to be readable in win95. I’m going to see if I can implement inserting and ejecting disks later this week.

JoeOsborn avatar Aug 22 '23 18:08 JoeOsborn

I've rebased this branch on latest master and removed unnecessary changes. eject is still in progress, but insert seems to work (I can boot FreeDOS 1.3 with an empty CD-ROM, see dir D: fail, insert a CD-ROM, and see dir D: succeed).

JoeOsborn avatar Sep 01 '23 17:09 JoeOsborn

@JoeOsborn Thanks! I will try to look into this in the coming days.

copy avatar Sep 01 '23 17:09 copy

I see my api-test is hanging, it was tricky to get the timing of abort/retry/fail and inserting the CD right. It works on my machine but I’m open to advice on getting it working in CI.

JoeOsborn avatar Sep 01 '23 19:09 JoeOsborn

Sorry @copy , I don’t want to use up your CI credits or whatever, but I don’t know how to kill the actions jobs from my end.

JoeOsborn avatar Sep 01 '23 19:09 JoeOsborn

After more testing, I see that ejecting a CD-ROM and inserting a new one also seems to work.

JoeOsborn avatar Sep 01 '23 19:09 JoeOsborn

This causes some problems on Linux:

There are some 0-bytes drives:

[   32.946240] sd 1:0:1:0: [sdc] 0 512-byte logical blocks: (0 B/0 B)           
[   32.946845] sd 1:0:1:0: [sdc] Write Protect is off                           
[   32.951335] sd 1:0:1:0: Attached scsi generic sg3 type 0                     
[   32.956020] sd 1:0:1:0: [sdc] Write cache: disabled, read cache: enabled, doe
sn't support DPO or FUA                                                         
[   32.970920] sd 1:0:1:0: [sdc] Attached SCSI disk                             

There are some timeouts, e.g. running fdisk /dev/sr0:

[  176.750270] ata2: lost interrupt (Status 0x58)
[  176.750920] ata2: drained 32 bytes to clear DRQ
[  176.751025] ata2.00: limiting speed to MWDMA1:PIO2
[  176.751340] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  176.751910] ata2.00: cmd a0/01:00:00:20:00/00:00:00:00:00/a0 tag 0 dma 16416 in
[  176.751910]          Get configuration 46 00 00 00 00 00 00 00 20 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  176.752900] ata2.00: status: { DRDY }
[  176.753640] ata2: soft resetting link
[  176.914875] ata2.00: configured for MWDMA1
[  176.915840] ata2.01: configured for MWDMA2
[  176.916755] ata2: EH complete
[  184.750285] ata2: lost interrupt (Status 0x58)
[  184.751185] ata2: drained 16 bytes to clear DRQ
[  184.751285] ata2.00: limiting speed to PIO2
[  184.751640] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  184.752215] ata2.00: cmd a0/01:00:00:10:00/00:00:00:00:00/a0 tag 0 dma 16400 in
[  184.752215]          Get configuration 46 00 00 28 00 00 00 00 10 00res 40/00:02:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[  184.753650] ata2.00: status: { DRDY }
[  184.755615] ata2: soft resetting link
[  184.914770] ata2.00: configured for PIO2
[  184.915615] ata2.01: configured for MWDMA2
[  184.916485] ata2: EH complete

You should be able to reproduce with tinycore (it can be used both as cdrom and hard drive disk).

copy avatar Sep 02 '23 18:09 copy

To clarify on the linux issues, is this the set of test cases you’re using?

  1. Boot from tinycore as hda, insert tinycore cd, mount it
  2. Boot from tinycore as hda with some cd image, run fdisk on the cd? This one’s a bit confusing to me, is it just that the error condition isn’t being handled properly? Or should something like CD-R support be implemented?

JoeOsborn avatar Sep 03 '23 16:09 JoeOsborn

@JoeOsborn Sorry, here is a better reproduction:

Boot with linux4.iso as cdrom and tinycore (or anything else) as hard drive, set boot order to CD first. fdisk /dev/sr0 hangs, but I believe the same hang also happens during boot on tinycore. It looks like an interrupt missing somewhere.

copy avatar Sep 03 '23 19:09 copy

Something I’m finding confusing here is that linux seems to be configuring the drive to use DMA even though the capabilities reported are lba-only.

I think maybe the OS is waiting for some status flag to be set or something, or else it’s ignoring irqs that come in too soon after the command is sent, since liberally adding irqs didn’t seem to help. Any tips for triggering an IRQ after some delay?

JoeOsborn avatar Sep 06 '23 18:09 JoeOsborn

Any tips for triggering an IRQ after some delay?

You can try setTimeout(() => this.push_irq(), 4) or similar, but in my experience that isn't usually needed and is somewhat of an anti-pattern.

copy avatar Sep 06 '23 23:09 copy

Just posting to say I'm still investigating this, but the beginning of the semester has slowed down my experiments here. Once I've settled into teaching a little more I'll have another chance to get back to it.

JoeOsborn avatar Sep 14 '23 15:09 JoeOsborn

Was this work abandoned? I think this is essential for the emulator and I'm working on a project that requires this functionality. I tried to build the version from this branch but it seems like it's broken. I'm getting the following errors: Screenshot 2023-12-01 at 7 07 09 pm

Sadly I don't think I have the skills to fix this.

ericmackrodt avatar Dec 01 '23 08:12 ericmackrodt

It’s not abandoned, but it is delayed. I likely won’t be able to pick it up until January—it is on my critical path for some preservation work, but other aspects of the project (and my other obligations) have taken the front seat for a while.

JoeOsborn avatar Dec 01 '23 14:12 JoeOsborn

That's fair enough, thank you for getting back to me.

ericmackrodt avatar Dec 03 '23 00:12 ericmackrodt