1541ultimate Bug in the hyperspeed kernal

In the hyperspeed kernal, the following basic program does give the wrong result (adapt the device number to your iec device):

10 open 1,10,15 20 print#1,"md:foo"; 30 print#1,"i" 40 close 1

With UCI enabled, a direcory "fooi" is been created. With UCI disabled, it works as it should: A directory "foo" is created, and then initialize is executed.

Tested on Ultimate 64 with the release firmware 3.11.

Jan 28 '24 20:01 markusC64

With this replacement you see that the semicolon on line 20 is not necessary:

20 print#1,"md:foo" 30 print#1,"cd:foo"

After that, you are not inside of direcectory "foo".

Jan 28 '24 20:01 markusC64

Curiously, this seems to work fine with C128 device manager's hyperspeed implementation (which is based on, but different from the hyperspeed kernal one), so I suspect this bug is in the hyperspeed kernal itself. My first guess would be the chkin/ckout caching.

This is with C128DM enabled, and using id 11 for 'hyperspeed'. soft iec is disabled.

Screenshot 2024-02-14 16-05-51

Feb 14 '24 16:02 bvl1999

I am not an expert in C128 kernal (and not C64 kernal). But I have the unverified impression that the problem is that CLRCH call ($FFCC) does not finish a command. Anyway, first I have observed the bug in an closed source assembler program. I made the mistake while trying to reproduce I queried the error channel quite often. In this case the kernal works.

Feb 14 '24 17:02 markusC64

I am not an expert in C128 kernal (and not C64 kernal). But I have the unverified impression that the problem is that CLRCH call ($FFCC) does not finish a command. Anyway, first I have observed the bug in an closed source assembler program. I made the mistake while trying to reproduce I queried the error channel quite often. In this case the kernal works.

That is entirely possible, and a good pointer to start looking. I know the implementation of this between the hyperspeed kernal and dm are slightly different. Both do implement their own clrchn for UCI to avoid having to flush buffers in the soft iec target when a channel gets reused after a clrchn, and so do some caching of the last used input and output channels. C128DM's hyperspeed implementation is outside the kernal, it extends the io code in a way similar to warpspeed128 (and many fastload cartridges for the c64, but unlike such cartridges, also works for sequential io, relative files, etc)

But it has been a few years since I last looked at that code.

Feb 14 '24 17:02 bvl1999

Regardless, this doesn't seem to be a firmware bug, it surely should be fixed, but imo isn't 'high priority' to fix. Sadly the hyperspeed kernal isn't used as much as we hoped when hyperspeed was implemented.

Feb 14 '24 17:02 bvl1999

On the other hand, the hyperspeed kernal is one way of ruling out timing problems on the IEC bus as a cause. I have currently a situation where all seem to make a difference: Jiffydos kernal or not, PAL, NTSC. Hyperspeed kernal cannot be tested because of this bug.

Anyway, I have something that might help increasing hyperspeed kernal's usage: A kernal that has both, Hyperspeed support and Jiffydos. For the C64. But of cause it will need the same fix as the "standard" hyperspeed kernal.

Feb 14 '24 17:02 markusC64

On the other hand, the hyperspeed kernal is one way of ruling out timing problems on the IEC bus as a cause. I have currently a situation where all seem to make a difference: Jiffydos kernal or not, PAL, NTSC. Hyperspeed kernal cannot be tested because of this bug.

Anyway, I have something that might help increasing hyperspeed kernal's usage: A kernal that has both, Hyperspeed support and Jiffydos. For the C64. But of cause it will need the same fix as the "standard" hyperspeed kernal.

Use a C128 and C128DM :-) (that setup allows for both hyperspeed and jiffydos at the same time, and actually should work with most other custom c128 kernals as well).

There is an unfinished project, the fc3wedge .crt, which is a .crt implementation of hyperspeed, with the idea it can coexist with the jd kernal, but, there are some bugs which neither me or Gideon spent the time on to fix them).

Anyway, the 'quick' fix would probably be to disable the caching of input/output channels, that would cause clrchn to always flush the buffers in uci, which is not good for performance of sequential io, but should avoid this problem.

Anyway... yeah, using hyperspeed kernal to completely bypass the iec bus would seem usefull for testing, just as long as you keep in mind there are some subtle differences beyond the 'bus' (ie, from what I recall, hyperspeed kernal abuses the highest byte of the various file tables, which could cause problems when opening 10 files at the same time, c128dm avoids this by having a little bit of cartridge ram outside system ram for keeping track of this data)

Feb 14 '24 18:02 bvl1999

So, the offending code seems to be this bit from roms/c64rom/kernal/uci.s just below the _my_chkout label:

        ; there is a pending CKOUT command, same SA?
        lda SECADDR
        cmp UCI_LAST_SA
        beq _ckout_cont ; Yes!  Do nothing, just append

For a quick fix you can just comment out that cmp and beq. That will flush the output buffer for each print# in basic. I think a better fix would be to check for SECADDR being $0f, and in that case branch ahead to after the UCI_LAST_SA check, and always flush the output buffer when the secondary address is 15. I don't think this is needed for sequential files, but I haven't considered how this buffer appending works out for relational files.

There is a similar check in _my_chkin, but for reading this should be fine in all cases.

Feb 14 '24 20:02 bvl1999

@markusC64 does this work for you? https://www.bartsplace.net/download/hskernal394.bin

Feb 15 '24 02:02 bvl1999

Unfortunately not. I am still receiving "cp1\ncd//os". So obviously two commands.

Feb 15 '24 07:02 markusC64

Interestingly enough, just removing cmp and beq does better...

Feb 15 '24 07:02 markusC64

Unfortunately not. I am still receiving "cp1\ncd//os". So obviously two commands.

Odd, will look at that later today when I'm back home.

Interestingly enough, just removing cmp and beq does better...

Yeah it should, and that will do for the moment for testing, but it is not good for performance when writing sequential files, so still want to do a more proper fix which won't have that issue.

Feb 15 '24 11:02 bvl1999

Is it possible that you need to compare with #$6F instead of #$0F? A peek command from basic 2.0 of to the address makes me think that.

Feb 15 '24 11:02 markusC64

Is it possible that you need to compare with #$6F instead of #$0F? A peek command from basic 2.0 of to the address makes me think that.

Maybe, will check that when I'm back home.

Feb 15 '24 12:02 bvl1999

I think the problem is not in the hyperspeed kernal, but in the way the commands are passed to the command channel, or how the command channel is processing the commands. The 'just append' branch is there to avoid a UCI command for every single byte. But unfortunately that is how IEC works. I think queuing the bytes is fine as long as the commands are executed one by one if they are separated by a return character (/r). This doesn't happen, as the dispatch function (the routine that takes the data from UCI and feeds it to SoftwareIEC) assumes that it can just push everything at once.

The reason that it works with IEC is that it responds to "unlisten". This probably doesn't happen, although I thought that I did pass the "CLRCH" call. We should check whether the CLRCH happens in between two PRINT commands. I think it does, or it should.

OR... the hyperspeed kernal doesn't clear the last channel byte upon CLRCH?

Feb 15 '24 12:02 GideonZ

Well, it is time to test the hyperspeed kernal. I think I'll use Bart's version, but with cmp patched from cmp #$$0F to cmp #$6F. And let C64OS Restore do it's job. Its reading and writing a megabyte of data, interleaved with "cd", "md" commands. So quite much data on the uci. We will see if there are bytes missing. That is the example that fails with current hyperspeed kernal. I have a reference of each file it saves. And yes, C64 kernal can be best tested with C64 software. You can model the kernals behavior in a simulation, but then you cannot be sure enough that the model equals the kernal.

BTW: "cP\x0D" (read it with C rules) is a valid command with a cr "inside" (at the end). So even this character cannot be used for separating commands. Another example is "m-r" with a suitable address or length.

I managed to port this change to the combined Jiffydos + Hyperspeed kernal. Which cannot be published in source code or binary because Jiffydos is still sold. But a patch that can be applied to the original jiffydos kernal can be published. If we need to optimize barts change, that is no problem. I think I have some free bytes still available.

Feb 15 '24 12:02 markusC64

BTW: "cP\x0D" (read it with C rules) is a valid command with a cr "inside" (at the end). So even this character cannot be used for separating commands. Another example is "m-r" with a suitable address or length.

Fair point. Well, in that case something goes wrong with CLRCH.

I don't completely agree with your testing strategy though. Testing with a lot of data doesn't test better. You have a failing case right now, and that should be solved. The less data the better, I'd say. (Work smarter, not harder.. ;)

I managed to port this change to the combined Jiffydos + Hyperspeed kernal.

That's actually great!

Feb 15 '24 13:02 GideonZ

Are we sure that this is not a bug that has been introduced with 3.11? Some control characters have been changed in the communication between the IEC communication layer and the drive code, since they have been separated.

Feb 15 '24 13:02 GideonZ

Of cause you're right. More data dioes not make it better. But since that program is closed source, it is easier to wait for the result than to disassemble, understand and make an assembler test case, You see, we need both test cases, the mentioned basic test case and the assembler test case. And in that test case, it is really the case that CLRCH is called between the commands. But bow that we have a small basic example to test, we can test older firmware to to see if they have the bug.

Well, since a CMP to #$6F works better, that is clearly needed. But we wait for Barts results. It still might be possible that some corner cases need another value, too.

Feb 15 '24 13:02 markusC64

Oh fine. The whole task succeeded without a single error. Every file has exactly the expected content. Every command was transferred correctly (otherwise files would have been created in the wrong directory). So this test just revealed one problem, probable due to a zero page conflict: The software suggested the wrong destionation partition, I have chosen the correct one. IIRC correct, it uses the zeropage to store it in between. That is the real target for this test: Finding problems one was not aware of. If comparing the data helps for the actual test, that is great and welcome.

Feb 15 '24 13:02 markusC64

I'm not sure a command is always followed by a \n.

Consider the basic program in the initial report, that shouldn't have a \n between the commands I'd think.

Feb 15 '24 13:02 bvl1999

I already stood corrected by Markus. :-)

Just the compare with 6F? Hmm... doesn't feel right. I think the issue should be solved in a more fundamentally correct manner.

Feb 15 '24 13:02 GideonZ

In fact both my basic program (line 20) and Gregs assembler program does not terminate a floppy command with a \r (ASCII 13) and also not with with \n (ASCII 10).

BTW: I had to move "MY_OUTLEN" from $9b to $c0 for the kernal with Jiffydos + Hyperspeed because the address $9b conflicts with jiffydos. Perhaps we should do the same for the kernal without jiffydos in order to have same addresses in use.

Feb 15 '24 14:02 markusC64

I already stood corrected by Markus. :-)

Just the compare with 6F? Hmm... doesn't feel right. I think the issue should be solved in a more fundamentally correct manner.

I think the 'correct' way would be to also 'emulate' listen/unlisten. From what I can tell, the unlisten that happens during a clrchn is what terminates the command. But that would not be good performance wise.

Checking for $6f feels wrong, and would behave differently from iec in a specific case, consider the following:

10 open 1,11,15 20 print#1, "cd:/usb0/test"

No close, no extra print# In that case, the 'fix' I suggest would cause the command to not get executed until something else tries to use the uci soft iec target, whereas over iec, it should get executed after the print#

Feb 15 '24 14:02 bvl1999

Oh, you're right...

I think the 'correct' way would be to also 'emulate' listen/unlisten I think.

Well, in that case, you also need iecout and iecin to be patched. Consider a low level assembler program that uses listen, some iecouts and unlisten. It would be bad of listen is redirected to UCI, but the iecout's are sent to the serial bus.

Feb 15 '24 14:02 markusC64

Oh, you're right...

I think the 'correct' way would be to also 'emulate' listen/unlisten I think.

Well, in that case, you also need iecout and iecin to be patched. Consider a low level assembler program that uses listen, some iecouts and unlisten. It would be bad of listen is redirected to UCI, but the iecout's are sent to the serial bus.

And none of those are vectored, so while that might be doable for a kernal, it wouldn't work for any non kernal hyperspeed implementation.

Feb 15 '24 14:02 bvl1999

So it might be better not to patch listen/talk and unlisten/untalk. With Geos MP3 Kernal I/O Edition, you can test that behavior. It really switches to serial bus when the Geos Driver takes over control, because at that moment, the low level calls are used.

Later device detection fails, but that is a completely unrelated problem.

Feb 15 '24 14:02 markusC64

So it might be better not to patch listen/talk and unlisten/untalk. With Geos MP3 Kernal I/O Edition, you can test that behavior. It really switches to serial bus when the Geos Driver takes over control, because at that moment, the low level calls are used.

Later device detection fails, but that is a completely unrelated problem.

Yeah, there are a few more such implementations around.

Now, in my opinion, using kernal io, but also using the low level functions which aren't vectored rather defeats the point of using kernal io, but alas, we are dealing with 40 years of legacy and people who didn't care about such formalities :-)

Feb 15 '24 14:02 bvl1999

Indeed. IMHO it is more important to care about bottlenecks you find by benchmarking. E. g. reading "sequential" (*) files is really, really slow. Not only with hyperspeed, but also with jiffydos.

(*) which can have type SEQ, USR or even PRG. The point is that you don't use the load call.

Feb 15 '24 14:02 markusC64

Indeed. IMHO it is more important to care about bottlenecks you find by benchmarking. E. g. reading "sequential" (*) files is really, really slow. Not only with hyperspeed, but also with jiffydos.

(*) which can have type SEQ, USR or even PRG. The point is that you don't use the load call.

Yes.. and that seems unavoidable as it is byte by byte. Even with 'fast serial' on a c128 that is still rather slow (about as fast as jiffydos).

Working around that would be possible by buffering sequential reads, for example, have a 256 byte buffer, and use dma to transfer 256 bytes at a time, but.. no system ram we can steal for that.

Feb 15 '24 15:02 bvl1999