flashfloppy icon indicating copy to clipboard operation
flashfloppy copied to clipboard

Async I/O possibilities

Open ejona86 opened this issue 3 years ago • 106 comments

An async I/O discussion started at https://github.com/keirf/FlashFloppy/pull/412#issuecomment-753576777 . It was an idea to free up a lot of memory from write_bc, allowing it to be reused for reading. Ideally it would allow us to buffer both sides of an entire track and absorb all write delays, until the track changes.

I've created a basic cooperative scheduler and an fs async API on top. I ported HFE (reading and writing) to async to see how it may look: https://github.com/keirf/FlashFloppy/compare/master...ejona86:io-thread . I implemented HFE in a KISS way. While the HFE code seems to work after some very basic testing, the point is the approach, and not the concrete code organization; lots of the code needs cleaning up.

Based on my flash drives' performance and full speed USB speed, I think ED and TD should be considered out reach for HFE independent of implementation. USB limits read speed of an ED track to at least 66 ms, assuming no overhead and latency. In my testing I think we could read an ED track in ~150 ms which leaves no room for writes. Even 100 ms would push it. We'd need more RAM to support them with HFE. ED should be easily supported with IMG instead. And TD would need us to not load both sides of the track.

I was able to steal enough memory to support 1.44 MB HD floppies with HFE, fully buffering both sides of the track. It is tight as expected, although there's still some options to shift memory around a little bit, like swapping to 8 bit dma to save 2 kB and using that memory elsewhere. I don't know whether we should assume we always have enough memory though. I have an HFE buffering approach in mind that would support HD with HFE without fully-buffering the track, but it would need to continually read the track from flash. Is it worth investing much time to support such a thing? The approach would lend itself to also support changing tracks without waiting for writes to finish committing to flash, although write latencies would still impact it.

Main questions:

  1. Does async seem worth the effort?
  2. Does the implementation approach look half-way sound? Or will we want to do it a substantially different way? I'm more concerned about the threading/io than HFE.
  3. Is it fair to limit ED/TD to IMG? How serious would we be about supporting TD?
  4. How concerned are we about the high memory use of HD floppies with HFE? It is worth letting them work similar to today where they constantly read from flash? (e.g., if we see high value in not constantly reading from flash, then we may keep memory available. Or we may encourage IMG for HD even if the HFE support code was there.)

ejona86 avatar Feb 08 '21 08:02 ejona86

What can I say, that looks pretty neat for a lashup implementation.

The whole point is to be able to buffer at least a whole side in RAM, right, and preferably both sides of a DS image? If we are having to endlessly stream in from USB, there is no safe time to drop in writes without 'suspending' reads and thus delaying index pulses, I believe? In short, I don't understand your proposed not-fully-buffering scheme: Could you expand on it a bit? My current belief is that this should concentrate on what we can fully buffer, while supporting everything else no worse than we do right now.

What I really care about for full buffering is IMG up to ED, and HFE up to DD. I don't think there's much (any?) HD HFE out there that is too esoteric to support easily with IMG? That said your rejigging of memory isn't that bad, although it screws the alt/logfile build a bit -- I think we can make async-vs-not probably dynamically determined based on available memory, and maybe do less async on the alt/logfile build (albeit it makes the build less representative and thus less useful in that way). [EDIT: Or make it all async but with similar behaviour to what we have now when low on memory?]

Anyway it is basically a good idea. A bunch of code probably needs generalising out of HFE if possible, the buffering would be nice to do in the buffercache layer, integrate that with the async layer a bit more... Well I'm sure you know all this! It would be splendid to lean on the read/write_data special-case buffers much less and the buffercache much more, in general in the existing code :)

There are also probably some corner cases to care about, like not killing the io thread while it's yielding during a USB transaction. Again, I don't doubt you know this.

Yeah... This is awesome overall, thanks! This is the first time I've had someone since the Xen project who can contribute useful heavy lifting, and those guys were payrolled by their companies. It is also very healthy to get other eyes on a sizeable project like this because bad design and implementation decisions tend to fester away, unacknowledged or ignored by the original author.

keirf avatar Feb 08 '21 10:02 keirf

By the way, is the naked attribute sugar for the kind of bare asm function I construct for call_cancellable_fn?

keirf avatar Feb 08 '21 12:02 keirf

Oh I like the fact that write-ahead-of-read should be rare (line 477, hfe.c). I was worried about that case but you're right: If readahead is continuing during write, it makes sense that readahead easily keeps ahead of the write cursor, just as it does the read cursor, since they are the same cursor. D'oh. :)

keirf avatar Feb 08 '21 13:02 keirf

One caveat for your particular area of interest: Of course the writes still race seek operations, the writes are occasionally very slow, and you are going to possibly still have trouble issuing timely index pulses while draining writes (since you have no read data to serve up, and maybe potentially no buffer space to stash writes). I think your hard sector scenario is still hard when your bulk data is parked on crappy Flash behind a slow half-duplex link. But I'm a pessimist. ;)

keirf avatar Feb 08 '21 13:02 keirf

In short, I don't understand your proposed not-fully-buffering scheme: Could you expand on it a bit?

EDIT: Or make it all async but with similar behaviour to what we have now when low on memory

That edit is along the lines I was thinking would be possible, but aggressive reads.

Basically, the approach of aggressively pre-filling reads still works even if we can't buffer an entire track. It just is less-good. Let's say read_data is 36 KB. That's still 150 ms worth of buffer. We could buffer 150 ms of reads which consumes ~60 ms of time to pull from flash. If there were any writes, we have at least 90 ms of time to write back some of it to open up buffer for reading. I view it as two super-imposed circular buffers, a read buffer and a write buffer. As the read advances, that opens up write buffer, and as the write advances it opens up read buffer.

Even with the fully-buffer-track approach, the mental-model of the circular buffer is helpful. It allows us to notice that we could actually seek as soon as there's few enough writes remaining to leave a substantial read buffer. And now hard sectors don't look as glum.

My current belief is that this should concentrate on what we can fully buffer, while supporting everything else no worse than we do right now.

Sounds good. I'm trying to make this as complicated as necessary, but no more.

I don't think there's much (any?) HD HFE out there that is too esoteric to support easily with IMG?

My thoughts as well.

That said your rejigging of memory isn't that bad, although it screws the alt/logfile build a bit

Oh, the log file. Right. That's why it is so large.

There's actually some wiggle room at the moment. write_data is 51076 bytes and HD HFE v2 needs 50176 so there's 900 of change. I had cut into the console before I moved the async cb handling back onto the main thread. Moving the cb handling made me feel comfortable reducing thread1_stack to 512 bytes. I'm still also not quite sure how big write_bc will need to be in the end.

the buffering would be nice to do in the buffercache layer, integrate that with the async layer a bit more... Well I'm sure you know all this!

Actually, I hadn't considered using the buffercache layer. I had been pretty happy to avoid it as it gave me "this is all your memory" "simplicity" and avoided duplicating memory. And that was probably the right thing for the PoC. But now that you mention it I could see how that could work. Devil's in the details, but definitely worth looking in to.

By the way, is the naked attribute sugar for the kind of bare asm function I construct for call_cancellable_fn?

Yeah. It is "let C handle the declaration and let asm do the rest." More precisely, it means "don't emit the function preamble and postamble."

I think your hard sector scenario is still hard when your bulk data is parked on crappy Flash behind a slow half-duplex link. But I'm a pessimist. ;)

Yeah, this is "phase 1" :). Best not to throw hard sectors into the mix yet. Eric looks at his imaginary chart. Yeah, those are phase 3 or later.

The slow, half-duplex, non-pipelined link is excruciating combined with flash latencies. The part that had me more strongly consider this approach is the high read latencies after a write that can occur; that made me reevaluate. That said, my hard sector PR is pretty fair already; it's just so much effort to fight the blocking to track down what is causing latency blips. Especially since printk consumes a lot of CPU and SWD is disabled.

ejona86 avatar Feb 09 '21 07:02 ejona86

Okay, let me do an answer to minor points, and I'll follow up with what I think are the biggest up-front decisions to be made.

  1. Aggressive writes: You may have 90ms of slack in which to write, but you cannot guarantee every stick will ack an arbitrary write in 90ms. Latency spikes are common. It does depend on stick. It may depend on write size (larger writes may behave better). But I would probably want to make the aggressiveness configurable, and by default I would writeback only when the whole track is buffered, or we're seeking, as these allow us a default behaviour of don't start a new track until writes are done. Which is still a big improvement on FlashFloppy today.

  2. alt/logfile: I wouldn't let this tail wag the dog. It's not used that often. That said, I'm not sure if we really care about 300RPM 500kbps HFE. See next message, as this is probably a big up-front decision.

  3. naked: That's cool, it lets the C compiler see the decl which is an ugly aspect of direct asm coding, particularly for static functions. I will use this. :)

  4. SWD: Well tbh I've never used SWD except for device programming. Is it useful for extracting logging too? Obviously I can hook up gdb, but I don't really use debuggers much. But if there's a way to extract logging on Linux that would be cool! Yeah SWD gets disabled, but not really for any good reason. One pin is used in the corner case chgrst=pa14 in FF.CFG. Certainly we could do a SWD-enabled build if that's handy? And I'd consider code to log through that channel too.

keirf avatar Feb 09 '21 08:02 keirf

The big decisions to be made.

  1. Buffer cache or ring: I think this the Big One. It informs the whole shape of development really. So my view was that the buffercache would sit on top of the FS stack (it's underneath at the moment, and might still appear there for fs metadata caching) and would be the asynchronous interface. Ie the IO thread would be a buffercache server.

Upsides: Everything is pretty and unified. One copy of data (maybe with very minimal stream buffering on top for some image handlers). Maybe simplifies image handlers by pulling code into a shared subsystem.

Downsides: The buffercache grows many new appendages and gets much more complicated: dirty buffers, locked buffers (probably), ... And we'd still need a conceptual ring in the floppy layer to track how much of a track is fetched (and possibly, where the buffers are). Maybe something unified in floppy subsystem makes more sense than buffercache subsystem. Metadata overhead per block probably higher than a simple contiguous ring. Probably need either de-fragging or we need iovecs down the IO stack to do multi-block IO (the latter wouldn't actually be that bad to do imo).

  1. Do we care about 300RPM 500kbps HFE: This is the largest/tightest case we could care about, and we'd need to implement with the RAM limitation carefully in mind. Compared with not bothering and falling back to something like what we do now (large write ring, so that format-track operations remain reliable). So, who might care? I think most HD HFE use cases are either scenarios where the writeback time doesn't matter (host leaves it to its FDC to time out, which is based on index pulses, which we suppress), or the disk format is standard (can use IMG, see #362).

What about hard sectored images? I suppose the only case that might be 500kbps would be eight-inch MFM. But that would spin at 360RPM and so require only ~42kB of RAM. Much more manageable!

If we don't need to achieve 50kB available, we have a lot more latitude in design and implementation.

keirf avatar Feb 09 '21 09:02 keirf

Aggressive writes. I have no problem with the conservative writes being a default. If we did aggressive writes we'd probably want some display feedback in the event of a buffer underrun.

SWD. No, it doesn't do logging, which means it's only for dire situations. I saw the usage of PA14 and hacked around it but still no dice. It didn't boot with 2<<24 in afio->mapr. There's still plenty of ways I could be messing it up as I've never truly used the thing and the datasheet also mentions a handshake starting with JTAG and transitioning to SWD.


Buffer cache or ring. The current buffer cache under the FS could work for buffering writes, but looks ill suited for reads. It prevents reading from cache while a readahead is ongoing and would require larger buffers in the image code just to satisfy fatfs. This approach means copies of the data into/out of the cache. We could hack around fatfs by putting the volume code into a "fake read" mode where it queues the read and returns fake read data. But there'd be no transparency into the cache and calculating amount of memory necessary for the cache is harder since metadata is mixed in. We'd probably want iovecs.

I think we really want something above the fs. I think a "ring cache" utility is the most direct approach. Most directly what we need, most memory efficient, and most versatile, with the downside of more complicated API interactions. I do think a sector-based approach could work, but I think it'd require copies to/from cache and it might use a ring internally to avoid fragmentation.

Do we care about 300RPM 500kbps HFE. It sounds like you are leaning toward supporting HFE in this case, but not bother with the fully-buffered track approach. That sounds fair; I also though the tight memory was quite restricting without much practical benefit. Since I don't think we want two copies of the HFE code (fully buffered track + current approach), that implies the caching approach will gain some complexity to handle partial tracks. That's fine with me. We can size things such that write_bc + write_data/2 >= track_side_size which allows buffering a track-side's worth of writes while also extending the amount of read buffer.

ejona86 avatar Feb 13 '21 17:02 ejona86

SWD: Ah that's a shame. I haven't played with SWD except for programming, so I can't authoritatively say what could be going wrong in setting up SWD. I'm pretty sure that AFIO_MAPR field can only be set once until reset though. So if you touched it twice that could be a problem. Anyhow, if we don't know how to do logging via SWD, it's not of much interest to me personally so I guess it's moot.

Cache vs Ring: This has been a useful discussion as I would have probably gone with the superficial attractiveness of a 'unified buffer cache' and hacked it to work, which of course could be done at the cost of time and complexity! I suppose the advantages are that the cache can remember previous tracks, share with metadata, and thus overall be a little more efficient on average. But the average case is after all not what we're really interested in, but rather hitting real-time goals. And a cache is probably not the right tool for that job. So I have to agree with you, and having thought a while about it now I'm increasingly enthusiastic about that choice. It's going to be a lot simpler code too.

When A Cylinder Doesn't Fit: Ok, so specifically we don't care about fitting HD HFE. I think we agree on that. So we just need to take care with fallback behaviour. Sizing the rings as you suggest makes sense. We'll also have to probably synchronously flush writes, at least in conservative-write mode. On all image types (except HFE) we can buffer both sides of a DS disk where it fits, degrade to buffer one side where only that fits (and treat side change like a seek), and something single-sided and HFE-fallback-like (as you suggested) that is really no worse than we have now for giant tracks (some people are configuring giant IMG images for example, for better or worse).

Who Codes It Up: Well, I'm figuring you want the okay to pursue this? It's fine with me if so, we can stick it in a branch and it can aim for a new v4.x release series. It'll need carving up into a bite-sized pieces for review, to some extent. It will also need to go through FF_TestBed automated tests, which I can push it through since you won't have the setup for that. Anyway, nothing surprising here I don't think... If it's going well perhaps I should give you direct access rights to the repo.

keirf avatar Feb 15 '21 10:02 keirf

Cache vs Ring: I've split out the ring logic from HFE into a separate API. It cleans up HFE. I've gotten far enough with the limited memory support that it's clear HFE will be minimally impacted by it.

When A Cylinder Doesn't Fit: My current plan for formats with separate track sides is for the ring reading code to update a "shadow" ring in parallel to the main ring. If the side changes, it can be "swapped in" (by changing pointers or some such; details TBD). If there's not enough memory for both sides, then it just goes back to "single ring only" mode. Having both rings is probably annoying in the ring code, but probably not hard. I'll reevaluate once I see how the code shapes up. (The main alternative to the shadow ring is to interleave the sides.)

Who Codes It Up: I've been cleaning up what I had and have been pushing it to my https://github.com/keirf/FlashFloppy/compare/master...ejona86:io-thread branch. It's honestly looking like it'll be an easy review, other than the massive number of places I could introduce bugs. Be aware that the constrained memory changes are causing me to change essentially every line of the ring cache as I have it use the ring cursors instead of absolute offsets, and I'm swapping things to tracking bytes itself of sectors. So that change will be quite noisy. I'm very glad you have automated testing.

ejona86 avatar Feb 16 '21 03:02 ejona86

FF_TestBed looks very straight-forward. If you have an extra PCB that's the cleanest, but this would be trivial with jumper wires. I have some 1x10 and 1x7 housings, so it wouldn't even be that janky. I should be able to set that up this week.

ejona86 avatar Feb 16 '21 05:02 ejona86

Okay that all sounds good. To be honest you're obviously fine to crack on and ask questions if you need to. If there are any areas you'd specifically like me to comment on, or image handlers you'd like me to port across to the new environment, let me know.

FF_TestBed is very handy, I designed it to be super simple. No IRQs, no heap, very easy to hack in new tests or diagnostics. So it is out of the box a simple set of regression tests, but you can see on branch speed_test it was easy to hack in logging on read-after-write latencies for comparison with "competing Gotek products". That sort of thing could make it into master branch, as could some tougher regression tests -- the current set are pretty vanilla, they don't bring much Pain, but were purely to pick up on the really dumb bugs I was regularly introducing a year or so back during more active development.

I do have... one or two PCBs... Ok maybe 100 actually. I panellised them and got 10x10cm PCB set done, and these are hardly popular with general FF users so they are my go-to for shimming flat-pack furniture and the like. ;) I'm happy to post you a couple if you email me a shipping address. They shouldn't take too long to get to you and it's a super-neat setup then.

keirf avatar Feb 16 '21 09:02 keirf

I've implemented track-larger-than-ring support and cleaned up the branch. The ring_io code is more complex than I'd like, but about as complex as I expected.

I put an 'async' flag on image_handler so now the arena allocation is auto-tuned and non-async images are supported. But Direct Access is broken if disk_handler is async because write_bc is too small, which blocks FF_TestBed testing. So I think adding async versions of disk_* I/O and DA async support are the main blockers at this point.

ejona86 avatar Feb 22 '21 00:02 ejona86

That's great! Maybe you can do a broader reset of the rings when switching to DA, to get into non async mode? On exit from DA we always fully remount so that direction is already covered. It's possibly an alternative to implementing more async stuff. Whatever you think is better, neater, etc.

keirf avatar Feb 22 '21 03:02 keirf

Maybe you can do a broader reset of the rings when switching to DA, to get into non async mode?

I'll give that a try. That'd be a more expedient. I've been a little concerned about going into/out-of DA mode and losing any async writes. I feel like it is probably more likely exiting DA mode than entering, so sync I/O is probably a plus. (I've thought I might need a "flush" function or something on image_handler, but haven't really wanted to go down that road unless I have to.)

ejona86 avatar Feb 22 '21 05:02 ejona86

The entry to DA mode is pretty shoddy really, tucked away in image.c. The aim originally (like, day one of FlashFloppy) iirc was that it would be possible to seamlessly move between DA and non-DA modes. That was killed off because DA mode changes the FAT filesystem and/or selected image file and so we really need a "full reset" on exit. But the entry to DA has forever remained a bit buried. It could at least perhaps be pulled out to floppy.c and may be easier to hack on there?

Also, more broadly, it would be nice to support DA mode even when no stick is inserted, or no image mounted (eg ejected). It feels a bit icky that an image must be mounted to get into DA mode. But that would need a method to force main.c out of navigation mode, and is certainly out of scope of your current work. Still it indicates the direction DA support may go and that what is there currently in terms of integration with the rest of FlashFloppy is not to be greatly respected or tiptoed round.

keirf avatar Feb 22 '21 10:02 keirf

I received the FF_TestBed adapter board and attached male+female connectors to it; pretty easy and very clean. It took me a bit to figure out the test bed wants the DUT to be in Shugart mode and to get the S0/1 jumper right. Testing against master, hfe and adf_test(11) work fine, but dsk hangs and adf_test(22) spams WARN at amiga.c:61: "index.count >= 2" and img_test() spams Long flux @ dma=454 bc=7749: 46658-46370=288 / 36 WARN at floppy_generic.c:53: "TRUE". So I still need to play with it a bit before I can test out my changes.

ejona86 avatar Feb 24 '21 06:02 ejona86

It can be a bit fussy. I will add the jumper settings to the "Running The Testbed" wiki page, where they properly belong. I will also give it a whirl myself and let you know how I get on.

keirf avatar Feb 24 '21 09:02 keirf

Well, I remember why I don't run it as often as I should, I always find it a pain to set up. The TestBed and DUT can backpower each other via the ribbon which can be confusing, plus for some reason (perhaps my new PC) I find it really hard to flash via USB-TTL, it takes me loads of attempts. Anyway, I did get it running. Here's a full round:

*** ROUND 11 ***

HFE TEST:
Motor-to-RDY: OFF=6us, ON=200ms
We have 4654 4489s

DSK TEST:
Period: 272 ms ;; GAP3: 84
Index delayed by 22 ms
MFM 8192 r/w sector - OK
Period: 342 ms ;; GAP3: 84
Index delayed by 3 ms
MFM 512 r/w sector - OK
Period: 342 ms ;; GAP3: 84
Index delayed by 3 ms
MFM 512 r/w sector - OK

ADF TEST:
Amiga DD - OK

ADF TEST:
Amiga HD - OK

IMG TEST:
Period: 199 ms ;; GAP3: 84
Index delayed by 3 ms
MFM 512 r/w sector - OK
Period: 199 ms ;; GAP3: 84
Index delayed by 3 ms
MFM 512 r/w sector - OK
Period: 199 ms ;; GAP3: 21
Index delayed by 2 ms
FM 256 r/w sector - OK
Period: 199 ms ;; GAP3: 84
Index delayed by 20 ms
MFM 8192 r/w sector - OK

I think I'm going to solder in a power feed from the TestBed for the DUT. That would be saner.

EDIT: Also the MOTOR-to-RDY line needs a modded Gotek DUT, to wire through the MOTOR signal. A normal Gotek will say MOTOR ignored.

keirf avatar Feb 24 '21 10:02 keirf

I've revamped the wiki page https://github.com/keirf/FF_TestBed/wiki/Running-The-TestBed

Please let me know if you think anything is still unclear.

keirf avatar Feb 24 '21 11:02 keirf

I got FF_TestBed up and running. I'm seeing one DSK error about expecting 16 sectors, but parts of DSK seem to still be okay and everything else looks fair. This is "good enough" to notice if I break the world. My earlier problem was messing up when preparing the USB drive.

I have no problem programming with my USB-TLL nor ST-Link. The only problem I had was re-learning how to unlock the flash. I've seen the backpowering happen with the Greaseweazle and FluxEngine, so I was prepared for that, but it didn't cause any trouble.

Fixing up image_setup_track looks nontrivial so I'll still need to figure out an approach there. With a huge hack, I am able to test my code and HFE passes. IMG is hanging for MFM 8192, which is surprising; something to look in to.

Huge hack:

diff --git a/src/image/image.c b/src/image/image.c
index 4145296..bcb892d 100644
--- a/src/image/image.c
+++ b/src/image/image.c
@@ -101,6 +101,8 @@ bool_t image_valid(FILINFO *fp)
     return FALSE;
 }
 
+static bool_t da_mode;
+
 static bool_t try_handler(struct image *im, struct slot *slot,
                           DWORD *cltbl,
                           const struct image_handler *handler)
@@ -120,6 +122,10 @@ static bool_t try_handler(struct image *im, struct slot *slot,
     im->stk_per_rev = stk_ms(200);
 
     im->disk_handler = im->track_handler = handler;
+    if (da_mode) {
+        im->track_handler = &da_image_handler;
+        im->nr_sides = 1;
+    }
 
     mode = FA_READ | FA_OPEN_EXISTING;
     if (handler->write_track != NULL)
@@ -259,12 +265,16 @@ bool_t image_setup_track(
 
 #if !defined(QUICKDISK)
     if (!in_da_mode(im, track>>1)) {
+        da_mode = FALSE;
         /* If we are exiting D-A mode then need to re-read the config file. */
         if (h == &da_image_handler)
             return TRUE;
         h = ((track>>1) >= im->nr_cyls) ? &dummy_image_handler
             : im->disk_handler;
     } else {
+        da_mode = TRUE;
+        if (h != &da_image_handler)
+            return TRUE;
         h = &da_image_handler;
         im->nr_sides = 1;
     }

ejona86 avatar Feb 25 '21 04:02 ejona86

Good news on the TestBed. Weird you still have some errors, but it's good enough for now. You can see if you break the world and I can also run tests in due course too. And maybe make some diabolical new ones.

Yes it might be better to teach floppy.c and/or floppy_generic.c a bit more about DA and mode switching. Or use a new return code from image_setup_track to remount things within floppy.c without disturbing main.c? I don't really know what the best answer is, but it's not like the current DA hooks in the rest of the code are anything great.

keirf avatar Feb 25 '21 09:02 keirf

I've tested HFE a bit and have resolved the issues I've noticed. But I haven't figured out a good way to give it a good stress test; my cheap USB floppy controller doesn't like FlashFloppy+HFE. I think I'm to a stopping point for this phase. Would you like to review the commits one-by-one, with each having its own PR? Or one large PR? Or just dump it all in a branch? I'll revise my current branch before doing so to make it cleaner.


I reworked the DA handling to have it controlled more by floppy_generic.c. I'm disappointed it wasn't easy to put in floppy.c but it is a bit nicer than it was in image.c. The handling is awkward because DA detection is dependent on how many tracks are in the current image. So I have to load the normal image handler and then replace it with DA when DA is needed. It looks like it'd be just a few lines to enhance it so DA mode can be entered even if no image is mounted.

I noticed that my async vs blocking allocation was very, very wrong and was dereferencing a NULL pointer. This was breaking the IMG test. The allocator now assumes a blocking handler and then reallocates if it turns out the handler is async. Not happy about the circular dependency existing, but the retry is simple.

I noticed switching sides with the ring buffer was really fast when fully buffering and slow otherwise. I tracked it down to read_bc causing the reader to be in the future; when the side changes we need to rewind and re-read what's already been read. read_bc is at least 16 ms (HD) which means 10ms grace period isn't enough cover. I've solved this by limiting the amount buffered in read_bc (to 2kb, which doesn't actually reduce the size) and keeping some already-consumed data around (4kb for HFE).

ejona86 avatar Feb 28 '21 20:02 ejona86

SWD: I learned it has a companion SWO (TRACESWO) which does allow printf-style debugging (not a console, because it is one-way). Unfortunately enabling SWO requires pa15 and pb3 (it uses pb3, but the register configuration requires enabling JTDI as well). The cheap st-link v2 clones also don't support SWO unless you do a hardware mod. So overall no real benefit over serial for us, although for a custom board it might make more sense.

ejona86 avatar Mar 01 '21 00:03 ejona86

I'm inclined to give you access rights and dump it in a branch? The aim here when ready is to branch master into stable-v3 and then merge your branch into master as new v4.x development series. The sooner we get to that point the better imo. (EDIT: Even if we're not ready at that point to immediately call v4.0a)

Yeah the side switch thing is fun eh? Glad you worked out a scheme for the non-fully-buffered case :)

keirf avatar Mar 01 '21 08:03 keirf

I'm game for the branch approach. I've cleaned up my branch to be ready to be rebased on top of the branch point. I split out https://github.com/keirf/FlashFloppy/pull/439 which is trivial and can happen before the branch point. I took a stab at the copyright notice for the new files; I hope it looks fine to you. Mention it if you have any a concern.

How much do we want to do in v4.x before releasing? Obviously stabilize. But would you be hoping for async IMG or similar to be done before a 4.0 release? I sort of imagine you do since it seems we want an async IMG and it'll need stabilizing, but I don't know the full scope that you might hope for.

ejona86 avatar Mar 02 '21 02:03 ejona86

Okay I sent you an invite for direct access. I think once you accept that you're set, though I'm not sure what granularity of access control is provided and whether I'll need to tick some boxes. I should have already asked: I assume you're set up with 2FA on your account?

The copyright/written-by notices are perfectly fine. You can feel free to add your name to any files you substantially modify. We could have an AUTHORS or CREDITS file too, though it feels a bit highfalutin for such a small number of contributors.

I'm not precisely sure where we should place the goalposts for v4.0a. I'd really like to get some test from users who have had problems with the existing "dumbio", so that's tickets:

  • #277 : ED IMG
  • #362 : HD IMG, can also test HFE but that's not going to be fully buffered

I can ask some regular users if they have any HFE scenarios with unreliable writes. All the same, even without IMG, drawing a line for 4.0a and getting wider test on it is not a bad idea. It is also then a baseline for our own testing, both correctness and perf.

keirf avatar Mar 02 '21 07:03 keirf

Okay, so async IMG is on the docket. The read/write paths look pretty stand-alone, but we'll need to figure out how we want to handle the sub-512 byte sector alignment. I should have an easier time stress testing IMG as well.

ejona86 avatar Mar 02 '21 08:03 ejona86

I had a quick look down the list of collaborator privileges, and it looks like there's everything you could plausibly need, so that's good: https://docs.github.com/en/github/setting-up-and-managing-your-github-user-account/permission-levels-for-a-user-account-repository

keirf avatar Mar 02 '21 08:03 keirf

As far as I'm concerned, you are now welcome to create and manage any branches you need to get your work ready for master. When you're ready, we can branch stable-v3 and open v4 development in master.

keirf avatar Mar 02 '21 08:03 keirf