Implement smart MFM error correction
There's some cunning code here for correcting single-bit errors: https://www.techtravels.org/2008/03/error-correction-using-hamming-distances/
Unfortunately it doesn't seem to help with missing bit errors, which I think I see more often. It's also focused on the terrible Amiga XOR checksum.
The problem is simple enough in theory: given certain transformations (corresponding to errors), reverse as few transformations as possible so that the data matches the checksum. Stack Overflow has some information about doing bit error correction with CRC here: https://stackoverflow.com/questions/3788570
There's a certain number of bit errors that it can definitely correct; if you go beyond this, it may get it right, but otherwise it would coerce the data into matching the CRC without the data itself being correct. And in the latter case, we wouldn't be able to tell that the data is wrong - so it either shouldn't be applied beyond its capacity to definitely correct errors, or it should give a big warning if told to do so. I'm not sure what the math would be like for bit insertion/deletion errors rather than just bit flips.
Missing flux transitions have been more common in my experience too; I have some badly damaged floppies where e.g. an 100101 has become an unnaturally long 10001. You could in theory try inserting a bit when encountering these very long sequences, but (at least in my experience) the media's often too far gone at that point and there are too many ambiguous sequences. Or if it's not, repeated reads get it right.
(That's from the perspective of formats that use an actual CRC instead of Amiga's XOR checksum)
The easiest way to do a bit insertion check would be to just insert a flux transition into the raw stream, then decode and see if the CRC matches, and try this for every position of the flux stream. It ought to be doable, if a little slow.
I'd be really pleased to see some progress on this one. Is there anything I can do?
If it helps I can supply flux files from marginal discs with single-sector checksum failures which can be verified when the sector is corrected (both by sector checksums and by testing of the file content, which is a compressed file).
I have worked a bit on and off on a smarter decoder, see https://github.com/kristomu/flux-analyze/tree/master/csrc. The main thing holding it back is that it's using the old FluxEngine format rather than the protobuf-derived one, and that the code is specific to the IBM MFM format. If you have any corrupted or marginal IBM MFM floppies, they'd be welcome; or if you know C++ and could write a replacement for get_flux_record() for the protobuf-derived format, that would also be useful.
I haven't implemented bit insertion/removal yet, but it does support the ordinal search strategy mentioned near the end of #170 .
(I think ND floppies use the IBM MFM format, but I haven't actually tried any of them.)
The marginal image I'm looking at just now is unfortunately not an IBM floppy, it's from a Victor 9000 format (Actually written on a Victor Vicki). Using a Greaseweazle fluxengine reads it mostly OK, there are five bad blocks. It's here if you want to take a look at it:
http://www.jubileegroup.co.uk/misc/UTILITIES.*.gz
There are two files, a flux file and the decoded image file.
If that's no use to you I can probably find quite a few marginal IBM PC floppies and I could let you have some flux files from those.
I can do C++ but looking at the flux-analyze code it would take me a long time to get up to speed. Without spending some time brainstorming with someone familiar with it, the time required might be longer than I have left on this planet...
From what I could gather, the Victor 9000 uses GCR, a format type that I don't know much about. I could look into it, but it would take time, so IBM MFM floppies would be more useful - any of the platforms listed at http://cowlark.com/fluxengine/doc/disk-ibm.html.
As for flux-analyze, I agree it's not the easiest code to read, since it's quite experimental. A stand-alone reader that returns the flux data (raw flux transition timings) for the modern FluxEngine format would also work, though; I should be able to turn that into a module for flux-analyze to read modern FE files.
Yes, the Victor uses GCR. Of the IBM flux files which I have readily available, the few with marginal spots unfortunately contain information which I can't make public. It's late, tomorrow I'll look for some with less sensitive content. I'm sure there will be a few.
It may take a little time to locate them, but I need to go through them anyway so it's probably better use of my time to do that than it would be of your time to look at Victor files. Stay tuned.
I have four scans of an IBM format MFM 3.5" DSDD floppy with marginal sectors.
The tarball is almost 8Mbytes.
Would you like to know more about it before you decide whether or not you want it?
If not, how would you like to take it?
@kristomu I'll admit to not having looked at what kind of errors occurred in any great detail; the ones I've noticed were missing-pulse errors resulting in the bitstream losing sync. This is most noticeable with MFM when a string of 00 suddenly becomes ff as the clock bits start getting treated as data bits. What kind of success rate were you seeing with the CRC error correction?
(Although, given the way the drive works, I'd have expected flux transitions to be early or late but not completely missing, so I'm not sure how this actually happens in practice.)
I've mostly focused on clock/band reconstruction and dealing with very long flux recordings, so I haven't done much of the CRC recovery yet.
For the Python version of flux-analyze, I wrote a very quick and dirty edit distance check that would take the MFM encoding of A1A1A1 corrupted with a single insert, delete, and up to three transposition errors. This is in approx_search.py. Usually this makes no difference, but for very heavily corrupted fluxes, it sometimes allows more sectors to be recovered. One example is track/RETRY-77-4-t73.0 in the flux-analyze repo: corrupted sector detection increases the number of good sectors from 7 to 9.
In principle, this idea could be extended all the way to full IDAMs because there's a limited number of values one can take.
It should in principle be easy to insert a bit in every position and see if the CRC changes. The only thing I'm concerned about is trying too many modifications and have something coincidentally get a valid CRC even when it's wrong. A more sophisticated approach like the article would be more safe, as it would only try to add bits where the decoder detects an MFM error.
I'm not sure how missing flux transitions happen on the analog level. The FDD has a differentiator for detecting sharp transitions. Maybe what happens is that this sharp transition gets smeared over a larger area due to demagnetization, so the differentiator never triggers? Seeing the actual analog data would be very useful, but I don't know anyone with a good enough scope.
@GWHAYWOOD Ideally I'd take an old (SQLite) .flux, but something that can be transformed to a FluxEngine flux file would also work, as I could just use an old version to transform it. Or FE reader code for the new protobuf format.
@kristomu
This is all rather new to me and I do not understand the terminology.
I do not know what you mean by "an old (SQLite) .flux.
Is there a document describing the history which I can read?
Is it possible for me somehow to produce such an old type of file using the current Fluxengine software and the Greaseweazle which I am currently using to scan discs? I'll be happy to do that for you if I can.
@kristomu
My oscilloscope might be good enough, I imagine 100MHz bandwidth would do.
It's under my desk, I can't remember the last time I powered it up.
Tomorrow I'll see if it still works.
I do not know what you mean by "an old (SQLite) .flux.
The first versions of FluxEngine used a different file format for .flux files which stored all the data in an SQlite database file. This worked really well, but it essentially made the binary format unknowable --- the only way to access the data inside the file was through the SQlite database software. So I changed it to something based on protobufs. This is almost as easy to use, software-wise, but has a defined encoding and can be read without needing the protobuf library.
FluxEngine does have support for exporting flux files as various other things, including vcd signal analyser files. These are plain text: https://en.wikipedia.org/wiki/Value_change_dump Might this be easiuer to read?
I could use any format that the original FE could convert to a flux file. I'm not sure what format the Greaseweazle outputs natively; all my dumps were through a KryoFlux.
For lack of something better, I should be able to use modern FluxEngine .flux files by converting to per-track .au files, but I can't seem to get FE to cooperate. With
./fluxengine rawread -s RETRY-77-4-t69.0.flux -d au:test_au with a converted version of RETRY-77-4-t69.0.flux, the FE program complains about not knowing the tpi, sector size, etc., even though that shouldn't be relevant to the conversion I'm trying to do. With an image read like
./fluxengine read ibm -s RETRY-77-4-t69.0.flux -o test.img --copy-flux-to au:test_au
it says there are no sectors, and nothing gets written to the test_au directory. The input flux is very distorted so I wouldn't be surprised if FE can't find any sectors, but that shouldn't stop it from doing a raw conversion.
The best (longer term) solution would be to have a proof of concept program that shows how I could use the FE sources, e.g. something that just reads a flux file, picks some track, and outputs the flux delay values (20 31 35 etc) for that track to cout. I should be able to integrate that with my C++ tool pretty easily.
@GWHAYWOOD Regarding analog capture: I found an extremely detailed manual for a 5.25 in FDD here: https://hxc2001.com/download/datasheet/floppy/thirdparty/Teac/10180320_FD55_Maint_1985.pdf Maybe it could be useful?
The TEAC manual will be incredibly useful, thank you.
I feel the urge to find a FD55 drive to play with.
Okay, something's definitely wrong there. The tpi stuff was an experiment which didn't work and I need to remove it. It's also possible that the au writer is simply broken. I'll take a look when I get the chance.
Regarding a simple program to dump the flux bytecode: the src/fe-inspect.cc actually does that. It does use all the internal FluxEngine functionality, though. However, as the flux file is a protobuf, you should be able to compile the .proto file (lib/fl2.proto) and then use this standalone to load a file.
@GWHAYWOOD I should be able to use modern FluxEngine .flux files. I'll just have to familiarize myself with protobuf and/or wait for the au functionality to work first.
@kristomu
[Quote :me :yesterday] My oscilloscope might be good enough, I imagine 100MHz bandwidth would do. It's under my desk, I can't remember the last time I powered it up. Tomorrow I'll see if it still works. [/Quote]
Made it. Just. It still works. The AC/DC switch on channel 1 seems to need attention but other than that it's fine.
Would it make sense if I you point me to one or two track images, I write them to a floppy, and we see what it looks like when viewed on the 'scope?
If you can record the analogue flux level and the digital output and produce a VCD signal analyser file, that'd be really interesting to see --- I've only ever worked at the digital level. Being able to see the flux level before and after the amplifier would be nice too.
I'd say seeing the analog flux level for the pre-amp output - equivalent to TP4 in the manual's figure 304 - would be useful. From a working floppy (doesn't really matter which), but also from a marginal or definitely corrupted one to see just what happens on the analog level to make transitions disappear or data loss happen in general.
The FluxEngine can of course (and unfortunately) not write marginal or corrupted tracks, as the floppy drives can't.
@davidgiven - At the moment I'd be able to display and measure the flux and digital levels (measurements of signal voltages on the oscilloscope screen). As it's a dual trace oscilloscope I could display both the flux value and the digital output after processing, but apart from photographing the screen I don't have the facilities to record those values. Obviously a track can have kilobytes of data so it's unrealistic to expect to display all that on a single sweep of the oscilloscope beam. You could 'home in' on a particular part of the data using a trigger such as the index pulse and a delayed sweep, but timing jitter might start to be a problem for lengthy delays (like tens of milliseconds). It could depend on where on the track the interesting marginal bit was. If it's in sector 1 it might be OK. I could knock together something like an ADC and sample the output fairly easily but I don't have any clue what sort of precision and sampling rates etc. I'd need to aim for. If you can help with some numbers it would save me quite a lot of brainstorming. I don't know what's in a VCD signal analyser file but I guess that's the least of my worries.
@kristomu - I follow your logic. I might be able to get something useful from one of the 5.25" drives here which has relatively little Large Scale Integration on the board so that there are enough places to poke the probes. I think I have enough information about it to find the necessary measurement points. I will get to that, bon't hold your breath.
@davidgiven I've found a simple, though tedious if not optimized, way to improve recovery rates. As I understand it, floppy disk controllers use a compensation system to deal with the clock slowly changing throughout the track (drift or warp). This can be implemented using exponential weighted averaging. From some VHDL source that (I think?) implements a part of a PC floppy controller, in effect it uses this with a smoothing constant alpha set to 1/2. By having a reasonable idea of the clock and then decoding the flux, in my case to an MFM stream, it can recover sectors that are hard to deal with otherwise; and by varying the constant (e.g. try once at alpha=0.5, see what sectors are recoverable, then try alpha=0.6, see what sectors are recoverable, etc), it's possible to get even more.
It's not quite the same thing as this kind of error correction, but it's surprisingly effective on my very messy images, e.g. on https://github.com/kristomu/flux-analyze/blob/master/tracks/RETRY-77-4-t69.0.flux I get 18 good sectors out of 20. In #170, keirf comments that FE only seems to get 6 good sectors.
My MFM decoding code is at https://github.com/kristomu/flux-analyze/blob/master/csrc/pulse_train.cc get_MFM_train_dewarp_historical(), but it should in principle be adaptable to any format. The difficult part is making it fast, e.g. by marking areas of a flux as having already been decoded to a good sector.
Coincidentally I've been doing (trying to do) something similar today, working with .scp files. As you say, it is very tedious. Sometimes it seems to me that different areas of a disc will respond better to different tweaks of the settings.
I have found that the results given by 'fluxengine merge' to merge .flux files have been very good.
Is there a way to merge .scp files in similar fashion?
Almost by accident I found the specification for the .scp file:
https://www.cbmstuff.com/downloads/scp/scp_image_specs.txt
Is there a similar document for the .flux file format which I can get my hands on?
If all else fails I can script something to merge the individual good sectors into a disc image, but I don't want to waste effort by re-inventing wheels.
@kristomu For merging SCP files, you can use FluxEngine. Use rawwrite to convert them from scp to flux, do the merge, and then rawwrite will convert them back to SCP again.
@kristomu That's really interesting. I'll admit to not understanding the maths on this stuff; the existing algorithm simply measures the interval between pulses and buckets by clock rate. The clock is calculated from the bit pattern at the start of each record, so provided the clock is reasonably constant within a record, it should happily cope with a changing clock throughout the track. But having a better algorithm would be really good. My main issue is whether this is generalisable to encodings other than MFM. Is it relying specifically on the clock bits, or does it synchronise on any bit (if it's present)?
The first one there should probably be @GWHAYWOOD not me :-)
It synchronizes on every transition, making a guess based on the relation between the transition length and the clock delay.
The very high order logic goes like this (using half-clocks which are periods of length equal to half a clock):
- Let the current "flux delay" be the time since the last flux reversal (i.e. a raw timing the FluxEngine records).
- Let the pulse length be the flux delay in multiples of a half-clock, based on the current half-clock estimate. Round off non-integer multiples. (Probably what you're already doing when bucketing.)
- If the pulse length is zero or obviously wrong (much higher than the encoding would allow), skip processing this flux delay. Otherwise...
- Record the actual pulse length for later decoding (probably also what you're already doing).
- Let the "instant half-clock" be what the half-clock period would be if the pulse length were dead center. E.g. if the pulse length is two half-clocks, then the proposed half-clock would be half the current flux delay.
- Let the proposed half-clock be (1-alpha) times the current half-clock estimate, plus alpha times the instant half-clock.
- If the proposed half-clock is between 3/4 and 9/4 of the current half-clock, update the current half-clock estimate by setting it to the proposed half-clock. Otherwise do nothing.
- Repeat.
The only MFM details this algorithm uses are what the reasonable range is (if pulse length is > 4, then it skips), and that every MFM pulse is an integer number of half clocks. MFM uses two, three, or four half-clocks (1.5, 2, or 2.5 clocks). So it should be relatively easy to adapt to other formats as long as they use multiples of half-clock lengths.
Currently I try different values of alpha. I imagine the rejection interval (3/4 ... 9/4) could also be tuned, but I think the effects would be less there.
I hope this isn't going off on a tangent, but I came across something called 'recover', which was written by one Emil Laurentiu of Romania, apparently around the turn of the century. At the tme of writing the only copy that I've been able to find is item number 661 here:
https://www.sac.sk/files.php?d=14&l=
I've only just found it, and so far my searches have come up with nothing more recent and nothing more about Mr. Laurentiu, but I wonder if he or anyone else took it any further. A couple of posts in a retro computing forum seemed to be saying that recover.exe was quite good at its job. The source is included in the archive linked above. I feel the urge to build it for Linux but time is short.
About Revive: from a cursory read of the docs, it seems to have three strategies:
- A way to estimate what sector a data chunk with no sector info belongs to, by using the interleave
- Averaging sector patterns to cancel out errors that only occur some of the time
- A more brute-force stitching operation where it checks every combination of partial sectors.
The first part should be relatively easy to port, but I don't know how interleaving works. Searching for corrupted sector metadata chunks (IDAMs) in the right rough location (just before a data chunk) may be easier.
I tried doing the second back when I was doing things in Python, but never had much luck; you need the probability of an erroneous read at any of the troublesome spots to be less than 50%, but usually it was greater, so a voting method-averaged result had more errors, not fewer.
The third is something I've been kicking around in my mind but haven't implemented. Possibly it can be done more robustly with sequence alignment algorithms. It shouldn't be too hard in theory (errors are local so differences wouldn't be long, and the A1A1... header gives a common starting point). But it would take considerable work: as you said, time is short. Emil also notes that the ability of the CRC to detect further errors is often lost due to the combinatorial explosion.
Moving on with this now I'm starting to get my head around some of it, is there a way to output what I'm going to call the 'raw bit stream' from a flux file? By 'raw bit stream' I mean the stream of flux reversals which represents everything on - for example - a track, but which is still - again for example - MFM encoded data, that is the flux reversals as shown in the graphic at the end of section 1.3 in this excellent article:
https://map.grauw.nl/articles/low-level-disk/
The ideal for me would probably be a bit stream like the one in the line prefixed with 'X' in that graphic.
My objective is to look at the signals in the drive and to tinker with the digitized data at the level of flux reversals. I want to be able to see bits on the oscilloscope screen and (relatively) easily associate them with the corresponding bits in the data stream. I guess I'm a way from being able to do that at the moment but I can now more or less see the path I need to take to get there.
The FluxEngine format stores delays between pulses. You can get them by using the inspect command. For instance:
./fluxengine inspect -s FILENAME.flux -c 1 -h 0 --dump-flux|less
will dump the delays found on track 1 head 0 to the console. They look like this:
0.000:-
0.979:
1.958: ==== 000001 2.000 +2.000 = 0.5 clocks
2.937:
3.916:
4.895:
5.874:-==== 000002 6.000 +4.000 = 1.0 clocks
6.853:
7.832:
8.811:
9.790:-==== 000003 9.917 +3.917 = 1.0 clocks
which (I think) in this case means that from where the recording started, 2 us passed before a flux transition, then 4 us, then 3.917 us, and so on. For the line
1.958: ==== 000001 2.000 +2.000 = 0.5 clocks
The first number gives the number of us since the start of the recording, then the flux transition count (this is the first transition/reversal), then the cumulative number of us, then the number of us since last transition, and finally the number of clocks it interprets this as being. I'm not sure why the first number doesn't match the cumulative number.
IIRC, MFM works in half-clocks, so 0.5 clocks correspond to a bit pattern of "10" (one half-clock = one zero), 1 clock to a pattern of "100", so above you have "100100" (again as far as I understand, we have to throw away everything up to the first transition because we don't know if the head started in the middle of a stretch between two flux reversals).
I don't know how to get the raw stream (here "100100"). The documentation mentions replacing --dump-flux with --dump-bits (which is meant to be --dump-bitstream) but its output doesn't seem to match my manual decodes. Maybe @davidgiven knows?
Are you trying to get at it directly from the flux file, or from with the FluxEngine codebase? If the former, look at the bytes field here: https://github.com/davidgiven/fluxengine/blob/master/lib/fl2.proto
The bytecode itself is very simple; the bottom six bits are the interval in ticks, and the top two bits are F_FLUX_PULSE and F_FLUX_INDEX show whether a data pulse or an index pulse are detected. Look at https://github.com/davidgiven/fluxengine/blob/master/protocol.h for these, as well as the tick length.
Note that each track can have multiple flux reads!
fluxengine inspect can be used to read the bitstream, but it needs to be given a clock. Without it, it'll try to guess based on a interval histogram. This is not the algorithm used by the real code, which works by looking for the sync pattern at the beginning of each record and setting the clock from that.
When showing the pulsetrain, inspect puts clock/4 on the Y axis, but then shows the precise timing of each pulse on the right. Naturally these almost never line up with the clock. No attempt is made to tidy up the pulsetrain, so what you see is what comes off the disk.
I'll take a look at fixing the VCD exporter (which is broken).