snark-barker-mca
snark-barker-mca copied to clipboard
Compatibility Model 57 486SLC2
Hi all-- first, I actually have the TexElec hardware, but from what I understand, on the bus it should be the same.
It disables the joystick port (like the model 56) because of conflicts with SCSI on planar. But the big issue is, PCM sound seems to intermittently cause NMIs, after which the SCSI goes away and doesn't answer requests until a power cycle (I get a 0010300 on the next POST if I reboot without a power cycle). I've tried every combination of DMA channel and IRQ.
I can reproduce this by going into FastTracker 2's config and toggling the sound back and forth between "Sound Blaster" to "Speaker". It usually manifests within 5-6 cycles, and always in less than 40.
I've tried with minimal configurations of jemmex / himem, getting rid of disk caching and all exotic things.
Adlib emulation, etc, all seems fine (haven't tried MIDI yet). Seems like it is likely something with DMA? Basically, it all almost works but it seems like there's something subtly wrong.
Also, one more note: I had had a Token Ring card in an expansion slot, and it seemed like after removing it seemed like things became a lot more reliable, taking about twice as long to trigger... so something electrical / bus loading / etc can't be ruled out.
have you tried testing it using SBDIAG?
Yes. SBDIAG is completely reliable on the tests I can do without scoping it-- DMA sine wave, play digital audio, etc (and the OPL3 tests, etc.) I can even hold down 6 to get in and out of the sine wave rapidly without issue.
I'm looking for a simpler repro. Original SB stuff-- test-sbc, sbtalker, etc, seems fine.
FastTracker seems reliable as long as the card is not reinited. It tends to pop and then hang in the same way between Wolf3D levels (perhaps something is being reinit'd here?) And then after ctrl-alt-del, I get the logic board error.
Just one more new observation: After I get things broken, sometimes I can exit to the DOS prompt. Things attached to SCSI don't work (general failure reading...). ... but also the floppy surprisingly does not work. It spins, etc, but again, "general failure". And then 00010300 on reboot.
that seems to be good news and bad news. the good news is that the card works. the bad news is that those programs aren't really compatible. the main reason is that MCA allows for IRQ sharing, and the BIOS will configure cards to share IRQs. but not all games will properly chain the IRQ handler for the sound card, which leads to compatibility issues.
I understand IRQ chaining. It seems to me like this must be something more:
- The problem is persistent post-reset. Whatever bad thing happens, causes the machine to fail the abbreviated reset-self test until power is cycled.
- No choice of interrupt works. My preferred IRQ 7 has no conflict-- the parallel port is disabled. I have also tried every combination of interrupt / DMA.
- FT2 sure seems to work if it works at startup... but if you get it to run its card initialization routine repeatedly something bad eventually happens.
Failing item in POST is 4D-- which I believe is checking the timer interrupt.
More info: It looks like something shows up in the system error log viewed from the reference disk.
I have one DMA arbitration timeout per crash... on DMA1 when using DMA1 and on DMA03 when using DMA3.
Doesn't this mean we're not hitting timing for DMA arbitration? This likely takes simultaneous I/O and sound to get to fail, which explains why I can't make it happen in SBDIAG, etc.
Note: I have a Snark Barker MCA on the way from Monotech, as well as some blank boards, so I can compare soon. Also will be nice to have the corresponding schematics so that I can hook up a logic analyzer if necessary.
hmm, that does sound like an error in the DMA arbitration state machine.
For kicks, I bumped SCSI to DMA1 and the SB to DMA3, and it's no better (possibly worse, but sample size is small).
Two things may be of interest or relevant: 1) I'm using a very, very quick-to-respond SCSI device, SCSI2SDv6-2021, which may increase the odds of bad timing... 2) the SCSI on the planar isn't really using DMA but is bus mastering, so it may be a fair bit more asynchronous/random than other DMAs.
And, of course, this is a relatively fast machine.
One thing I see, peeking at the verilog-- it doesn't look like the preempt signal is handled as an input.
If another card or the host raises preempt, we're supposed to vacate the bus within 7.8 microseconds (78 cycles?). We're a higher arbitration priority than the SCSI, but there's a "fairness" feature where it can decide to preempt us after awhile and we're supposed to stay inactive until preempt and status falls. pg 29-30, "Programmable Arbitration and the Inactive State", IBM PS/2 Hardware Interface Technical Ref.
If an arbitrating device holds -BURST active for more than 7.8 microseconds after an active -PREEMPT, an error condition may exist, and a channel time-out may occur. ARB/-GNT is driven high Micro Channel Architecture, Arbitration 29 immediately and takes control of the channel from the controlling master. An NMI is driven active. The channel remains in the arbitration state under the control of the system microprocessor until released by a NMI handler.
This sounds exactly like what I'm seeing-- NMI fires and no IO works afterwards. Some peripherals let you turn fairness off, but the SCSI controller does not have this feature.
Just an add'l heads up: my ETA on the Monotech card is Aug 16th. Unfortunately, I will be very busy for that week and won't be able to get to attaching a logic analyzer until a week or two later. The existing card I could JTAG if we come up with a new bitstream before then, but it doesn't have the signals broken out for analysis.
I'm hoping CHCK# gets asserted on a DMA timeout (an NMI pops, so this seems likely). In which case I can see whether PREEMPT# fiddliness happens immediately before CHCK# and what the DMA signals looks like for a number of cycles before then.
Ehhh.. now I see we don't burst, so I'm confused.
Yep, preempting of an ongoing DMA cycle implies burst mode. An individual transfer is "atomic" but a burst operation isn't necessarily (iirc there is a way to do atomic burst operations, limited by fairness). I've noticed this by watching the model 95 doing DMA with the SCSI controller, and I've seen burst operations interrupted partway through, then continue after the interruption. It's all part of the "joy" that is MCA.
I deleted a zillion comments, to have a cleaner problem report and correct mistakes in attempting to trace the logic. I'm really sorry for all the massive qtys of emails/comments.
This is a "good" DMA cycle:
This is a "bad" DMA cycle, half horizontal res (500ns vs 200ns). We don't get granted immediately, and then once we are, nothing happens:
Why does the DMA controller do nothing? Perhaps it's been de-inited... And we're just supposed to obey preempt in this case?? PS/2 technical reference indicates that after awhile of S0W#, S1R#, CMD#, and BURST# being quiescent, the arbiter is supposed to yank the bus away, but maybe this doesn't apply until at least one transaction happens...
This is the "bad" cycle zoomed way out showing the timeout behavior.
This is further zoomed out to show a train of successful DMAs followed by death:
Looks like the PS/2 is generating a weird DMA cycle. The SB expects to see CMD pulse after arb/gnt goes low (aka arbitration cycle ends). However it never does. You can also see CMD pulse during the arbitration cycle which doesn't seem correct. You could try checking the REFRESH line but perhaps there's a simultaneous memory refresh going on that's screwing it up.
You can see traffic on S0/S1/MIO without any CMD pulses. This means that higher-speed bus commands are occurring. This is allowed by the spec, but typically those transactions are controlled by a separate CMD line that is not brought out to our bus connector.
In the image, I circled some of this activity going on in the S1/READ line. There seems to be another burst transfer going on of some sort. It's not a "standard" burst because the BURST line is never engaged. The SB doesn't know any better so it asserts PREEMPT right in the middle of this, and I think the DMA controller gets confused. The interesting part is that the bus arbitration controller responds correctly to PREEMPT by starting an arbitration cycle, but after arbitration, the DMA controller never executes the transfer, so the SB gets stuck.
Perhaps the solution is for the SB CPLD to have a timeout counter that cancels the transaction if CMD never asserts after an arbitration cycle.
PS/2 technical reference, DMA Controller (Type 1), mentions masking DRQ in 8237 mode and actually documents this exact behavior that we're seeing:
Each channel has a corresponding mask bit that, when set, disables the DMA from servicing the requesting device. Each mask bit can be set to 0 or 1 by the system microprocessor. A system reset or DMA Controller Reset command sets all mask bits to 1. A Clear Mask Register command sets mask bits 0 - 3 or mask bits 4 - 7 to O.
When a device requesting DMA cycles wins the arbitration cycle, and the mask bit is set to 1 on the corresponding channel, the DMA controller does not execute any cycles in its behalf and allows external devices to provide the transfer. If no device responds, the bus times out and causes a nonmaskable interrupt (NMI).
This is different from ISA, where the DRQ being masked would just mean the slave is never serviced. No recommendation is made on how to deal with races between disabling the "8237" and the slave, which ISA dealt with gracefully if you masked DRQ during the operation.
A timeout may be the way to go. It's a shame that we can't just obey PREEMPT. But in a normal preemption we'd want to take the bus back and try again...
It's unclear to me that there's any actual way to fix this. We don't have a way to relinquish the arbitration grant-- the central arbiter does it when the burst, command, and status bits do the right thing.... and the DMA controller will not toggle them because it is disabled. CMD needs to toggle at least once.
I guess we could generate a fake busmaster-memory-read (only involves driving a couple open collector lines) without driving the address lines--- to whatever was latched last??-- which would hopefully be relatively harmless and tell the central arbiter to take the bus away from us.
or just time out and release the bus arbitration lines. then try again with the DMA operation later.
Does the arbiter know to re-arbitrate then? The manual doesn't make it sound like it (and makes it sound like it may even be holding them low.
then try again with the DMA operation later.
That's the proper behavior if preempted normally-- so no timeout would be needed. But in this case, I suspect it's due to the DMA controller being disabled as the slave makes a DMA pending... so the DMA will be pending "forever" as the controller is masked. (Would be harmless on ISA).
Like, I'm not the expert on this at all... But this language:
The central arbiter recognizes an end-of-transfer when both status signals (-so and -51), -BURST, and -CMD are inactive. Control of the channel is then transferred to the next higher priority device or to the system microprocessor by default.
Makes it sound like ARB/GNT# will go high only when posedge CMD and those other conditions are met (S0#/S1#, BURST# all high).
My proposal:
- The maximum DMA cycle I've seen to the card takes 1.3us
- 7.8 us is a value where timeouts could occur, even though in reality it seems that we have a lot of margin beyond this.
- I propose adding a 6 bit timer, reset when not granted and counting 14.318MHz during grant cycles.
- When timer is 11111x, drive S1R low (140 ns pulse, after 4330 ns).
I'm hoping the latter is enough because the Sept 1990 manual describes an "aborted cycle" with a minimum Sx width (T2A) of 85ns, and the arbiter is documented to listen to Sx. But if it doesn't work, my next plan would be
- When timer is 11101x or 11110x, drive, drive S1R low. When timer is 1111xx, drive CMD low.
And if that's not enough, it gets gory to need to generate address latch, etc, too.
(Of course, this all depends upon how things come out of slower machines whether 3.9us after winning arb is safe to give up on DMA controller...)
I'm hoping to find time to install ISE and crank on this before I'm swallowed up in back-to-school meetings next week (I'm a middle school teacher).
I've had only a short time to play, but I'm having difficulty adding a timer and fitting in the device. There's plenty of registers, but the actual placing/routing is problematic. Cleverness will be required.
This is what I tried:
inout s1_r_l,
...
reg [5:0] dma_timeout_tmr;
reg dummy_read_cycle;
...
// I am concerned about the mingling of clk14 and other signals that aren't synchronous
// to it without any synchronizer-- metastability, but I am mimicking the design pattern
// seen above with the timeout timer
always @ (posedge clk14) begin
if (dma_selected) begin
dma_timeout_tmr <= dma_timeout_tmr + 1;
end else begin
dma_timeout_tmr <= 6'b000000;
end
if (dma_timeout_tmr[5:1] == 5'b11111) begin
dummy_read_cycle <= 1'b1;
end else begin
dummy_read_cycle <= 1'b0;
end
end
assign s1_r_l = (dummy_read_cycle & arb_won) ? 1'b0 : 1'bZ;
Looks like others are seeing the exact same issue:
Yup-- I'm not going to get to really look at this until school is underway. Probably the most logical approach that I can see would be to combine the two timers. It could count all the way up and then stop at 63 instead of 34 or whatever. It's delicate, though.
@schlae Any help you can offer would be greatly appreciated. I can probably find time to test bitstreams in this next couple of weeks, but there's no time to think. Thank you for everything-- your advice on this and the S.B. design.
I'll have to spend some time thinking about it. Another possibility is to add an external RC delay although that is less precise.
Yup-- that's possible, but there's a lot of cards out here at this point. On Snark Barker it would be OK because it could plug into the CPLD debug header, but on the Resound card it'd be a real dead bug rework pain.
I think it's going to be a little clever/hacky no matter what.